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Abstract 

Current  research  in  employing  pattern  recognition  techniques  in  a  wireless  sensor 
network  (WSN)  to  detect  anomalous  or  suspicious  behavior  is  limited.  The  purpose  of 
this  research  was  to  determine  the  feasibility  of  an  accurate  tracking  and  intent  assessment 
system  of  unknown  or  foreign  radio  frequency  (RF)  emitters  in  close  proximity  to  and 
within  military  installations  as  a  method  for  physical  security. 

22  position  tracks  were  collected  using  a  hand-held  Global  Positioning  System  (GPS) 
unit  and  a  training  data  set  from  five  different  features  was  generated  for  each  position 
track.  Each  collected  position  track  was  individually  classified  as  suspicious  or  non- 
suspicious  by  the  leave-one-out-cross-validation  (LOOCV)  method  using  four  different 
classification  methods.  The  four  classification  methods  used  in  this  research  were  the 
linear  discriminant  function  (LDF),  the  diagonal  linear  discriminant  function  (DLDF), 
the  quadratic  discriminant  function  (QDF)  and  the  Mahalanobis  distance  method.  The 
accuracies  and  false  positive/negative  error  rates  of  the  four  classification  methods  were 
compared  for  different  assessment  system  configurations.  Additionally,  best  fit  receiver 
operating  characteristic  (ROC)  curves  were  generated  for  each  classification  method  and 
discussed. 

The  QDF  classification  method  out-performed  the  other  three  classification  methods. 
This  classification  method  achieved  an  accuracy  of  95%  when  it  classified  the  22  position 
tracks  one  at  a  time.  The  lowest  false  positive  and  false  negative  rates  were  10%  and  0%, 
respectively.  The  prior  probabilities  for  the  non-suspicious  and  suspicious  classes  were 
both  set  to  50%  class  for  this  configuration. 
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RF  EMITTER  TRACKING  AND  INTENT 


ASSESSMENT 

I.  Introduction 

This  chapter  introduces  this  research  and  provides  a  brief  background  on  the 
increasing  prevalence  of  geolocation  technologies  to  locate  emitters.  It  includes 
the  problem  statement,  research  objectives,  limitations,  equipment  required, 
and  a  section  communicating  the  importance  of  this  research  to  the  Department  of  Defense 
(DoD). 

1.1  Background 

Research  in  the  area  of  locating  and  tracking  Radio  Frequency  (RF)  signal-emitting 
sources  has  been  conducted  for  the  last  six  decades  [1].  The  process  of  determining 
an  unknown  position  of  an  emitter  is  called  source  localization  or  geolocation.  The 
capability  to  geolocate  an  emitting  object  is  currently  a  critical  requirement  necessary 
to  perform  certain  missions  of  the  United  States  (US)  Military.  The  ability  to  determine 
the  precise  location  of  US  military  personnel  on  the  earth  and  navigate  their  movement 
requires  geolocation.  Additionally,  millions  of  dollars  are  invested  each  year  into  the 
continuing  research  and  advancement  of  Location  Based  Services  (LBS)  in  the  commercial 
telecommunications  sector.  In  2010  there  were  6,000  location-based  applications  for  the 
iPhone,  900  for  the  Android  and  300  for  the  Blackberry  [2]. 

An  example  of  a  commercial  application  that  uses  geolocation  is  navigation. 
Currently,  geolocation  can  be  accomplished  by  both  terrestrial-based  (ground)  and  space- 
based  systems.  Global  Positioning  System  (GPS)  is  an  example  of  a  system  used  to 
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geolocate  objects  in  motion  both  on  the  surface  of  the  earth  and  airborne.  Both  automobiles 
and  airplanes  benefit  from  geolocation-aided  navigation  [3]. 

Geolocation  is  also  employed  in  the  military  for  locating  and  tracking  both  hostile 
and  foreign  unknown  targets.  Wireless  Sensor  Networks  (WSNs)  are  one  method  for 
accomplishing  this.  WSNs  can  be  used  to  determine  the  position  or  location  of  a  foreign 
emitter  within  the  network.  They  are  comprised  of  light-weight  devices  referred  to  as 
sensor  nodes.  These  multi-function  nodes  are  designed  to  sense  their  environment,  process 
data  and  communicate  with  each  other  using  radio  waves  [4].  Figure  1.1  is  a  diagram  of 
a  typical  WSN.  The  base  station  is  the  central  communications  focal  point,  receiving  and 
transmitting  pertinent  information  from  and  to  the  sensor  nodes. 


Figure  1.1:  A  Typical  WSN,  used  with  permission  [5]. 


With  the  addition  of  a  geolocation  method  to  record  estimated  foreign  emitter 
positions  at  certain  time  samples  within  the  network,  a  real-time  emitter  tracking  system 
can  be  created.  The  tracking  data  produced  by  this  system  can  be  used  to  generate 
feature  data  which  is  employed  by  a  pattern  recognition  mechanism  called  a  classifier. 
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The  classifier  will  use  training  data  to  determine  if  the  emitter  exhibits  non-suspicious  or 
suspicious  (anomalous)  behavior. 

The  system  described  in  the  previous  paragraph  is  an  example  of  a  tracking  and  intent 
assessment  system.  A  human  can  be  placed  in  the  assessment  loop  and  is  cued  based  on  the 
classifier’s  decision  that  the  position  track  is  suspicious.  Chapter  II  will  explain  in  detail 
different  methods  employed  for  geolocation.  That  chapter  will  also  explain  how  WSNs  can 
also  be  used  to  detect  anomalous  behavior  in  the  computer  network  or  cyber  domain  in 
addition  to  the  physical  domain  that  was  discussed  in  this  section. 

1.2  Problem  Statement 

Current  research  in  the  area  of  employing  pattern  recognition  techniques  for  detecting 
suspicious  or  anomalous  behavior  within  a  sensor  network  is  limited.  The  purpose  of 
this  research  was  to  determine  the  feasibility  of  an  accurate  intent  assessment  system  of 
unknown  or  foreign  emitters  for  military  installations  as  a  method  for  physical  security. 
Another  goal  of  this  research  effort  was  to  create  geolocated  position  tracks  in  real-time 
using  wireless  sensors  and  an  estimation  algorithm.  These  position  tracks  were  to  be 
employed  as  training  data  for  the  intent  assessment  system.  However,  due  to  hardware 
issues  with  the  sensors,  a  hand-held  GPS  unit  was  instead  used  create  the  position  tracks. 

1.3  Scope  and  Application 

Latitude-Longitude  (Lat-Long)  position  tracks  were  collected  using  a  GPS  unit  inside 
a  pre -determined  area  within  Wright  Patterson  Air  Force  Base  (WPAFB),  Area  B.  This  data 
collection  area  was  2,095  feet  by  2,095  feet  or  4,389,025  feet2  (about  100  acres).  Google 
Maps®  was  used  to  create  an  overhead  satellite  image  for  the  data  collect  area.  In  addition, 
this  application  was  also  used  to  create  a  reference  point  on  the  data  collect  area  to  convert 
the  position  track  data  in  Matrix  Laboratory®  (MATLAB)  from  Lat-Long  coordinates  to  x- 
y  coordinate  data.  Feature  data  was  generated  for  each  position  track  in  the  database  which 
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was  fed  into  a  classifier.  The  classifier  classified  each  position  track  as  non-suspicious  or 
suspicious  using  the  Leave-One-Out-Cross- Validation  (LOOCV)  method. 

A  considerable  amount  of  time  and  effort  was  spent  accomplishing  certain  tasks  for 
this  research.  Time  was  required  to  create  position  tracks  with  the  GPS  unit.  Additionally, 
an  interface  was  created  with  the  aid  of  Google  Maps®  to  precisely  plot  the  collected 
position  tracks  on  an  overhead  image  and  to  convert  image  pixel  distances  into  an 
acceptable  unit  of  measure  (feet).  A  third  task  undertaken  was  to  store  all  the  position 
tracks  in  a  single  matrix  in  MATLAB®  to  be  efficiently  passed  into  the  feature  generation 
algorithms.  These  tasks  are  described  in  detail  in  chapter  III. 

1.4  Research  Objectives 

The  main  objective  of  this  research  was  to  determine  if  position  tracks  collected  within 
a  WSN  could  be  accurately  classified  as  non-suspicious  or  suspicious.  Additionally,  a 
secondary  goal  was  to  determine  if  accurate  position  tracks  could  be  created  from  real-time 
geolocated  Received  Signal  Strength  (RSS)  data  of  an  emitter  in  a  WSN. 

1.5  Assumptions 

Certain  limitations  and  assumptions  were  established  for  this  research  effort.  First,  the 
emitter  that  was  tracked  inside  the  WSN  could  exist  in  a  state  of  motion.  Secondly,  the 
intent  assessment  system  designed  in  this  research  could  only  locate  and  track  one  emitter 
in  real-time. 

1.6  Equipment  Used 

A  Magellan®  Mobile  Mapper  GPS  unit  was  used  to  collect  the  position  tracks  for 
this  research.  Additionally,  MATLAB®  was  used  to  process  all  the  data  pertaining  to  this 
research  and  to  generate  the  results.  All  of  the  MATLAB®  code  developed  for  this  research 
was  original  except  for  three  functions.  The  first  function  imported  data  from  the  hand-held 
GPS  unit  into  MATLAB®.  The  second  function  converted  Lat-Long  coordinates  to  x-y 
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coordinate  data.  A  third  function  created  landmark  distance  maps  using  landmark  bitmap 
images.  These  functions  were  written  by  a  prior  Air  Force  Institute  of  Technology  (AFIT) 
student. 

1.7  Motivation 

Protecting  the  integrity  of  military  installations  is  paramount  to  the  ongoing  operations 
of  the  US  Military.  Breakthroughs  in  this  research  area  would  inevitably  have  a  positive 
impact  on  the  security  of  both  domestic  and  deployed  military  bases  alike.  The  concept  of 
an  automated  intent  assessment  system  poses  multiple  advantages  to  the  military. 

This  system  would  reduce  the  required  security  manning  associated  with  perimeter 
surveillance  by  determining  the  intent  of  personnel  approaching  military  installations  [6] . 
By  reducing  the  manning  of  security  personnel,  the  operating  costs  required  to  keep  the 
installation  at  a  normal  operating  level  would  decrease.  The  security  personnel  displaced 
from  the  task  of  monitoring  perimeter  activity  could  be  employed  in  some  other  fashion. 

Additionally,  the  physical  security  infrastructure  of  the  installation  would  improve 
due  to  a  precise,  real-time  tracking  and  intent  assessment  system  installed  to  monitor  all 
RF  emitter  activity  within  close  proximity  to  and  inside  the  installation.  Suspicious  emitter 
activity  would  be  flagged  so  that  security  personnel  could  take  the  proper  action. 

1.8  Thesis  Organization 

This  thesis  is  organized  into  five  chapters.  Chapter  II  presents  a  background  on 
geolocation  and  pattern  recognition  methods  in  the  context  of  this  research.  Chapter  III 
explains  the  processes  and  methodologies  used  in  this  research.  Chapter  IV  presents  the 
analytical  results  of  this  research.  Finally,  chapter  V  summarizes  this  thesis,  provides  a 
conclusion  and  discusses  areas  of  possible  future  research. 
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II.  Background 


This  chapter  introduces  the  theory  and  principles  relevant  to  RF  emitter  tracking 
and  intent  assessments.  The  fundamental  principles  of  source  localization  and 
pattern  recognition  are  discussed  as  well  as  current  research  thrusts  in  the  areas 
of  anomaly  detection  and  pattern  recognition  in  a  WSN  to  aid  in  computer  network  security. 

With  the  demand  increase  for  heightened  security  both  in  the  physical  and  cyber 
domains,  an  emphasis  has  been  placed  on  advancing  research  in  the  area  of  developing 
sophisticated  algorithms  that  detect  and  prevent  intrusions  [4].  Pattern  recognition 
techniques  that  detect  anomalous  events  in  WSNs  are  viable  solutions  for  meeting  this 
demand.  No  existing  literature  was  found  on  previous  research  attempts  to  determine  the 
feasibility  of  RF  emitter  tracking  and  intent  assessment  systems  designed  specifically  to 
improve  a  military  installation’s  security. 

2.1  Source  Localization 

Geolocation  or  source  localization  is  the  process  of  estimating  the  position  or  location 
of  an  RF  transmitter.  There  are  multiple  ways  to  perform  geolocation.  Four  common 
geolocation  methods  are  Time  of  Arrival  (TOA),  Time  Distance  of  Arrival  (TDOA),  Angle 
of  Arrival  (AOA)  and  RSS  [7]. 

A  conference  paper  introduced  an  RF  geolocation  and  tracking  system  for  the  US 
Navy.  The  paper  communicated  that  the  Navy  requires  accurate  detecting,  locating  and 
tracking  of  mobile  RF  emitters.  The  system  design  used  Direction  of  Arrival  (DOA) 
(similar  to  AOA)  as  the  geolocation  method  and  proposed  a  Kalman  filter  for  the  tracking 
mechanism  [8].  This  system  did  not  perform  intent  assessments  of  the  emitters. 
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2.1.1  Time  of  Arrival 

TOA  is  a  localization  technique  that  measures  the  absolute  time  at  which  a  transmitted 
signal  first  arrives  at  a  receiver.  If  two  receivers  are  in  operation,  TOA  data  will  determine 
the  estimated  transmitter  position  to  be  one  of  two  equally  probable  points.  If  three 
receivers  are  employed,  a  single,  precise  position  estimate  results  through  a  process 
called  tri-lateration.  This  process  involves  determining  the  emitter  position  by  using  the 
intersection  of  ranging  circles.  Figure  2.1  illustrates  the  tri-lateration  process.  A,  B  and  C 
are  the  sensors,  X  is  the  emitter,  and  rABC  are  the  radial  distances  from  each  sensor  to  X. 
It  is  possible  in  certain  cases  that  tri-lateration  will  produce  more  than  one  estimate  for  the 
emitter.  Multi-lateration  employs  at  least  four  sensors  and  improves  the  accuracy  of  the 
localization  process  [1], 
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Figure  2.1:  Tri-lateration  TOA. 


One  requirement  of  TOA  is  that  the  internal  clocks  in  all  the  devices  (including 
the  transmitter)  must  be  precisely  synchronized.  Given  the  high  propagation  speeds 
of  transmitted  signals,  very  small  discrepancies  in  time  synchronization  can  result  in 
very  large  errors  in  location  accuracy.  TOA-based  positioning  solutions  are  typically 
challenging  in  environments  where  large  amounts  of  multi-path,  interference,  or  noise  may 
exist.  GPS  is  an  example  of  a  location  system  that  uses  TOA  [1], 

2.1.2  Time  Distance  of  Arrival 

TDOA  uses  relative  time  measurements  instead  of  the  absolute  time  measurements 
that  TOA  employs.  While  all  the  receivers  need  to  be  precisely  synchronized  in  TDOA, 
it  is  not  necessary  for  the  transmitter  to  also  be  synchronized.  Geolocation  with  TDOA 
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is  performed  through  a  process  called  hyperbolic-lateration  [1],  Figure  2.2  illustrates  this 
process. 


TDOAc  A 


Figure  2.2:  Hyperbolic-lateration  TDOA. 


Hyperbolas  are  drawn  for  each  sensor  pair.  First,  the  constant  TDOA  is  calculated  for 
two  sensors  using  Equation  (2.1)  [9]: 

TDOAB_A  =  \TB-TA\  =  k,  (2.1) 

where  TDOAB-A  is  the  constant  time  difference  between  sensors  A  and  B.  It  is  calculated 
using  Ta  and  TB:  the  emitter’s  TOA  to  the  two  sensors  respectively.  The  constant  difference 
in  time  between  the  emitting  signal’s  arrival  to  sensors  A  and  B  is  k.  The  constant  difference 
in  distance  is  then  calculated  using  Equation  (2.2)  [9]: 
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I Dxb  ~  Dxa\  -  kc,  (2.2) 

where  DXb  and  DXa  are  the  constant  differences  from  the  emitter  to  each  sensor, 
respectively.  The  constant  time  difference  k  is  multiplied  to  c,  the  speed  of  light  (2.9  x 
108  meters  per  second)  in  order  to  achieve  units  of  meters.  The  hyperbola  between  the  two 
sensors  is  drawn  using  Equation  (2.3)  [9]: 

Vo*  -  xa)2  +  (y  -  Ja)2  -  V(-*--*b)2  +  Cv-Vb)2  =  kc,  (2.3) 

where  xA,  yA,  xB  and  yB  are  the  x-y  coordinate  locations  of  the  two  sensors  respectively. 
The  two  variables  in  this  equation  are  x  and  y  which  lie  on  the  x-y  coordinate  plane  and 
represent  all  possible  x-y  coordinates  of  the  hyperbola.  The  hyperbola’s  foci  are  centered 
at  the  locations  of  sensors  A  and  B.  This  process  is  repeated  for  each  hyperbola.  The  point 
of  intersection  on  the  hyperbolas  represents  the  estimated  emitter  position  [9]. 

2.1.3  Angle  of  Arrival 

AOA  estimation  uses  a  non-lateral  approach  to  estimate  the  position  of  an  emitter.  It 
is  accomplished  through  a  phased-array  antenna  at  each  sensor  that  estimates  the  direction 
of  the  emitter’s  signal.  The  antenna  is  comprised  of  a  sensor  array  and  a  real-time  adaptive 
signal  processor.  The  signal  processor  calculates  and  draws  a  Line-of-B earing  (LOB)  for 
each  sensor.  The  estimated  position  of  the  emitter  is  where  the  LOBs  intersect.  This  process 
is  called  triangulation. 

Only  two  receivers  need  to  be  employed  to  acquire  a  position  estimate.  Figures  2.3 
and  2.4  convey  how  AOA  is  used  to  triangulate  an  emitter’s  position  using  two  and  three 
receivers,  respectively. 
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Figure  2.3:  Triangulation  using  Two  Receivers. 
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Figure  2.4:  Triangulation  using  Three  Receivers,  used  with  permission  [10]. 


The  antennas  in  each  sensor  antenna  array  must  be  synchronized  and  require 
processing  time  to  calculate  the  phase  differences  of  the  received  emitter  signal.  However, 
no  time  synchronization  is  required  between  one  AOA  sensor  and  another.  One  application 
of  AOA  is  the  processing  of  radar  signals  [1]. 

2.1.4  Received  Signal  Strength 

Geolocation  can  also  be  accomplished  through  RSS,  which  is  defined  as  the  voltage 
measured  by  a  receiver’s  Received  Signal  Strength  Indicator  (RSSI).  In  this  geolocation 
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method,  sensors  are  arranged  in  a  particular  configuration  in  the  network.  Each  sensor 
records  the  transmitted  signal’s  RSS  value  in  decibels  (dBs),  which  is  proportional  to  the 
logarithm  of  the  distance  from  the  emitter  to  the  sensor.  The  RSS  sensors  report  this  value 
to  a  base  station  [1], 

The  base  station  collects  the  RSS  values  from  all  the  sensors  and  inputs  them  into 
a  position  estimator  such  as  a  Maximum  Likelihood  Estimator  (MLE)  [1],  A  very  noisy 
position  estimate  can  be  determined  with  just  three  RSS  sensors.  As  the  number  of  RSS- 
reporting  sensors  in  the  network  increases,  the  accuracy  of  the  position  estimate  also 
increases. 

RSS  is  a  popular  geolocation  method  because  it  is  simple  and  inexpensive  to 
implement.  It  should  be  noted  however,  that  as  the  number  of  sensors  used  in  the 
network  increases,  the  cost  to  implement  RSS  geolocation  increases.  Additionally, 
position  estimates  can  be  unpredictable  and  highly  inaccurate  due  to  the  variability  in  RSS 
measurements  [1]. 

The  task  of  geolocating  an  emitter  on  a  military  installation  in  real-time  would  require 
many  sensors  to  cover  the  large  area.  Since  RSS  sensors  are  inexpensive,  RSS  is  the  most 
appropriate  geolocation  method  for  this  research.  In  addition  to  being  cheap,  RSS  sensors 
are  easy  to  work  with;  they  only  one  measure  one  variable.  TOA,  TDOA  and  AOA  sensors 
are  significantly  more  expensive  than  RSS  sensors. 

The  particular  configuration  that  RSS  sensors  are  arranged  in  will  affect  the  accuracy 
of  the  estimator  used  to  produce  x-y  position  estimates  of  the  emitter.  Figure  2.5  displays 
three  different  sensor  configurations  used  for  simulated  RSS  geolocation  scenarios.  An 
MLE  was  used  for  each  configuration  to  estimate  the  emitter  positions.  The  results  of 
the  three  configurations  were  compared  to  determine  which  configuration  yielded  the  best 
accuracy. 
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Figure  2.5:  Example  of  Simulated  RSS  Geolocation  Scenarios,  used  with  permission  [7]. 


The  MLE  was  executed  1,000  times  for  each  of  the  five  emitters,  generating  1,000 
x-y  position  estimates  for  each  emitter.  Error  ellipses  for  the  Cramer- Rao  Lower  Bound 
(CRLB)  and  MLE  were  calculated  and  plotted  for  all  five  emitters  for  each  sensor 
configuration.  The  CRLB  presents  the  lower  bound  (or  minimum)  of  an  unbiased 
estimator’s  covariance  (<x2).  In  other  words,  the  variance  of  an  unbiased  estimator  must 
be  greater  than  or  equal  to  this  bound.  An  estimator  that  achieves  the  CRLB  is  considered 
efficient.  The  MLE  is  a  popular  estimator  and  is  unbiased.  It  is  unbiased  because  on 
average,  the  estimated  position  of  the  emitter  is  correct.  Additionally,  it  is  known  to 
consistently  achieve  the  CRLB  and  has  a  Gaussian  or  normally-distributed  Probability 
Density  Lunction  (PDL)  [11]. 

The  CRLB  Confidence  Interval  (Cl)  ellipse  bounds  the  area  that  95%  of  the  estimated 
emitter  positions  would  reside  in  if  the  estimator  used  were  as  efficient  as  possible  (if  the 
minimum  variance  were  acheived).  The  MLE  Cl  ellipse  bounds  the  area  in  which  95%  of 
the  estimated  emitter  positions  actually  resided  in  for  the  simulation.  Ligure  2.6  illustrates 
the  x-y  position  estimate  simulation  results  for  five  emitters  and  the  95%  Cl  CRLB  and 
MLE  error  ellipses  for  sensor  configuration  one.  The  MLE  ellipse  for  emitter  one  was 
exactly  the  same  as  the  CRLB  for  that  emitter.  The  MLE  ellipses  for  emitters  two,  three 
and  four  were  slightly  larger  than  the  corresponding  CRLB  ellipses  for  those  emitters. 
The  MLE  ellipse  for  emitter  five  was  not  completely  accurate  because  the  estimates  were 
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confined  by  the  search  space.  The  search  space  effectively  biased  the  estimator  which 
resulted  in  an  MLE  ellipse  that  appeared  tighter  than  it  actually  was. 
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Figure  2.6:  Emitter  Estimates  and  95%  Cl  CRLB/MLE  Error  Ellipses  for  Sensor 
Configuration  One  [12]. 


Figure  2.7  illustrates  the  emitter  estimates  and  error  ellipses  for  sensor  configuration 
two’s  simulation  results.  The  MLE  ellipses  for  emitters  one,  three  and  four  extended 
significantly  in  both  the  positive  and  negative  x  direction  when  compared  to  the  CRLB 
ellipse  for  those  emitters.  The  MLE  ellipses  for  emitter  two  was  very  simliar  to  the  CRLB 
ellipse.  The  MLE  ellipse  for  emitter  five  was  again  biased  by  the  search  space.  This 
configuration  was  the  least  ideal  because  the  variances  of  estimated  positions  of  all  five 
emitters  were  high. 
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Figure  2.7:  Emitter  Estimates  and  95%  Cl  CRLB/MLE  Error  Ellipses  for  Sensor 
Configuration  Two  [12]. 


Figure  2.8  illustrates  the  emitter  estimates  and  and  error  ellipses  for  sensor 
configuration  three.  The  MLE  ellipses  for  emitters  one,  two,  three  and  four  were  only 
slightly  larger  in  area  than  their  CRLB  ellipse  counterparts.  The  estimates  for  emitters  one, 
three  and  four  were  very  tight.  The  MLE  ellipse  for  emitter  four  was  also  slightly  biased  by 
the  search  space.  The  MLE  ellipse  for  emitter  five  was  significantly  biased  by  the  search 
space. 

Overall,  sensor  configuration  three  produced  the  tightest  MLE  ellipses  for  all  five 
emitters.  Emitters  one,  three  and  four  had  very  tight  ellipses.  Emitter  four’s  estimates 
varied  significantly  and  subsequently  had  the  largest  CRLB  and  MLE  ellipses.  The  farther 
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Figure  2.8:  Emitter  Estimates  and  95%  Cl  CRLB/MLE  Error  Ellipses  for  Sensor 
Configuration  Three  [12]. 


out  of  the  sensor  network  the  true  emitter  locations  were,  the  higher  the  variance  was  of  the 
estimates  for  those  emitters. 

This  section  presented  a  brief  background  on  geolocation  and  four  common 
localization  techniques.  An  explanation  was  given  to  support  why  RSS  is  the  most 
appropriate  geolocation  method  for  an  RF  emitter  tracking  and  intent  assessment.  The 
results  of  simulated  RSS  geolocation  scenarios  for  three  different  sensor  configurations 
were  presented  and  discussed.  The  next  section  will  introduce  the  fundamentals  of  pattern 
recognition  and  its  applications  for  determining  the  intent  of  an  emitter  in  a  WSN. 
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2.2  Pattern  Recognition 

Pattern  recognition  is  an  area  of  study  in  the  Artificial  Intelligence  (AI)  or  machine 
learning  research  field  and  has  many  applications  for  speech  recognition,  image  analysis 
and  cognitive  and  computer  science.  Pattern  recognition  is  the  field  of  processing  raw  data 
and  assigning  that  data  to  a  certain  class,  or  category.  The  four  main  stages  of  a  pattern 
recognition  system  are:  data  collection,  segmentation,  feature  generation,  and  classification 
[13].  All  four  stages  are  defined  in  this  section  and  an  explanation  is  given  concerning  how 
each  stage  applies  to  this  research  effort. 

2.2.1  Data  Collection 

In  the  data  collection  (or  sensing)  stage,  data  from  the  subjects  of  interest  are  collected 
and  stored  [13].  For  this  research,  Lat-Long  position  tracks  were  collected  using  the  GPS 
unit.  The  position  tracks  were  collected  in  real-time,  and  stored  in  a  database  used  as 
training  data  for  the  classifier.  The  intent  assessment  in  this  research  was  designed  to 
classify  a  position  track  as  suspicious  or  non-suspicious  in  real-time. 

2.2.2  Segmentation 

In  the  segmentation  stage,  the  collected  data  is  differentiated  from  one  subject  to 
another.  The  segmentation  process  can  be  challenging  if  there  are  unclear  or  unestablished 
baselines  to  delineate  one  subject  from  another  [13].  In  this  research,  the  intent  assessment 
was  designed  to  classify  only  one  position  track  in  real-time. 

2.2.3  Feature  Generation 

In  this  stage,  distinguishing  features  are  generated  from  the  subject’s  data  set  for 
classification.  If  more  than  one  feature  is  generated  from  a  subject  data  set,  a  feature  vector 
is  created  for  each  subject  [13].  In  this  research,  five  features  were  employed:  dwell  (or 
loiter  time),  repetition,  deviation  from  roads  and  parking  lots,  proximity  to  a  high-valued 
building,  and  proximity  to  a  water  tower.  Subpixel  detection  was  performed  to  determine 
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which  pixel  in  the  overhead  image  a  position  track  resided  in  for  each  time  stamp.  Chapter 
III  discusses  the  feature  generation  process  for  this  research  in  detail. 

2.2.4  Classification 

The  task  of  assigning  an  object  to  a  class  using  the  feature  vector  is  defined  as 
classification.  A  classifier  that  assigns  a  sample  into  one  of  two  classes  is  formally  called 
a  dichotomizer  but  is  more  commonly  known  as  a  binary  classifier  [13].  A  classifier  that 
assigns  an  object  into  more  than  two  classes  is  called  a  polychotomizer  [13].  The  classifier 
used  in  this  research  was  a  binary  classifier  because  the  class  that  each  position  track  was 
assigned  to  was  either  suspicious  or  non-suspicious. 

The  degree  of  difficulty  in  classifying  objects  to  the  correct  class  is  directly 
proportional  to  the  variability  of  the  feature  generation  data.  The  feature  data  for  objects 
that  belong  to  the  same  class  can  be  varied  due  to  complexity  and  noise.  Bayesian  decision 
theory  is  a  common  statistical  process  employed  in  pattern  classification  [13]. 

There  are  many  pattern  recognition  methods  for  classifying  subjects  into  classes. 
The  four  classification  methods  used  to  classify  position  tracks  as  suspicious  or  non- 
suspicious  for  this  research  were:  the  Linear  Discriminant  Function  (LDF),  the  Diagonal 
Linear  Discriminant  Function  (DLDF),  the  Quadratic  Discriminant  Function  (QDF)  and  the 
Mahalanobis  classification  method.  The  next  five  subsubsections  briefly  describe  certain 
classification  methods. 

2.2.4.1  LDA 

LDA  (or  Fisher’s  Linear  Discriminant)  is  a  classification  method  that  uses  supervised 
training  to  reduce  an  V  dimensional  data  set  of  a  two-class  classification  problem  to  one 
dimension  [13].  Multiple  Discriminant  Analysis  (MDA)  is  an  extension  of  LDA  applied  to 
a  multiple  class  problem.  MDA  reduces  an  V  dimensional  data  set  to  a  C  -  1  dimensional 
data  set  where  C  is  the  number  of  classes  [13]. 
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22.4.2  LDF 


It  was  determined  that  MATLAB®’s  classify  function  uses  discriminant  functions  to 
classify  data.  LDFs  classify  a  data  set  by  determining  the  optimal  separating  hyperplane. 
Equation  (2.4)  is  the  general  function  of  an  LDF: 

/(*)  =  wTx  +  w0,  (2.4) 

where  w  is  the  weight  vector  ,  x  is  the  input  vector  (the  data  sample  of  an  object  to  be 
classified  into  a  certain  class)  and  w0  is  the  bias  or  threshold.  The  input  vector  x  is  classified 
according  to  the  conditional  statements  below: 

x  G  Ci,  if  fix)  >  0, 
x  G  C2,  if  fix)  <  0, 

where  Ci  and  C2  denote  class  one  and  two  respectively.  The  hyperplane  is  a  d  -  1 
dimensional  surface,  where  d  is  the  number  of  features  of  the  data  [13].  In  this  research,  d 
was  five  and  was  explained  in  subsection  2.2.3.  The  weight  vector  w  is  easily  determined 
by  use  of  the  matrix  form  of  Equation  (2.5)  as  [13]: 


Xw  -  b. 


(2.5) 


where  X  is  the  matrix  of  training  data  and  b  is  an  arbitrary  user-defined  vector. 
Equation  (2.6)  shows  (2.5)  in  expanded  form. 
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In  (2.6)  [13],  d  is  the  number  of  features  of  the  data  and  n  is  the  number  of  samples  included 
in  the  training  data  set  which  was  22  for  this  research.  There  were  22  position  tracks 
included  in  the  position  track  database.  There  are  d  +  1  columns  in  the  training  data  matrix 
(X)  of  (2.6). 

In  X,  the  samples  are  rows  and  the  features  are  columns.  The  dimensions  of  X  were  22 
x  6  for  this  research.  The  dimension  of  w  was  6  x  1  to  include  the  bias  (h'o).  The  dimension 
of  b  was  22  x  1.  There  are  many  possible  solutions  for  b.  One  possible  solution  is  that  all 
rows  in  the  matrix  are  ones. 

From  (2.5),  if  X  is  non-singular  (or  invertible),  X-1  can  be  multiplied  to  both  sides  of 
the  equation,  as  shown  in  Equation  (2.7).  Non-singular  matrices  have  an  inverse  that  exists 
and  can  be  calculated  as  [13]  : 


XX~lw  =  X~lb.  (2.7) 

Equation  (2.8)  can  then  be  used  to  solve  for  w  as  [13]  : 

w  =  X~lb.  (2.8) 

However,  X  is  typically  singular  (which  means  that  the  inverse  of  X  does  not  exist)  [13]. 
While  X' 1  cannot  be  solved  directly,  the  hyperplane  can  still  be  determined.  First,  X7  is 
multiplied  to  both  sides  of  the  equation  as  shown  in  Equation  (2.9)  [13]. 


XTXw  =  XTb  (2.9) 

Then,  (XrX)_1  is  multiplied  to  both  sides  of  the  equation  as  shown  in  Equation  (2.10)  [13]. 

(xTxxxTxyiw  =  (xTxy1xTb  (2.10) 
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Equation  (2.10)  quickly  simplifies  because  (XT X)  1  cancels  the  {XT X)  on  the  left  side  of 
the  equation.  Equation  (2.11)  is  the  resulting  equation  solved  for  w. 


w  =  (XTX)-]XTb  (2.11) 

pseudoinverse 

The  product  ( XTX)~lXT  is  called  the  pseudoinverse  and  is  multiplied  to  b  to  determine  w. 
The  bias  (wo)  of  the  LDF  function  can  be  calculated  when  the  value  of  f{x)  is  0.  This  is 
demonstrated  in  Equation  (2.12)  [13]. 


0  =  wTx  +  wo  (2.12) 

Equation  (2.13)  is  the  result  when  h’0  is  subtracted  from  both  sides  and  then  division  is 
performed  by  ||w||  [13],  which  is  the  12  norm  of  the  hyperplane. 


WT  X  w0 

INI  IMI 


(2.13) 


When  w  and  vv0  are  determined,  any  input  vector  x  can  be  classified  into  the  appropriate 
class  using  the  discriminant  function  in  (2.4)  [13]. 


2.2.43  DLDF 


The  DLDF  classification  method  is  similar  to  the  LDF  method  except  that  a  diagonal 
covariance  matrix  is  developed.  This  matrix  asserts  that  the  five  features  used  in  this 
research  were  independent.  A  classifier  that  has  a  diagonal  covariance  matrix  is  called 
a  naive  bayes  classifier  [13]. 

2.2.4.4  QDF 

QDF  is  simply  an  extension  of  the  LDF  but  instead  uses  a  quadratic  function. 
Equation  (2.14)  is  the  general  form  for  the  quadratic  discriminant  function. 


f(x)  =  wTx2  +  wTx  +  w0 


(2.14) 
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In  the  same  manner  as  (2.4),  w  is  the  weight  vector,  x  is  the  input  vector  and  vv0  is  the  bias. 

2.2.4. 5  Mahalanobis  Method 

The  Mahalanobis  method  was  the  fourth  classification  method  employed  in  this 
research.  It  is  fundamentally  different  than  the  previous  three  classification  methods.  This 
method  uses  a  distance  metric  to  classify  samples  into  classes.  It  makes  use  of  the  fact 
that  the  direction  of  a  data  set’s  variance  plays  an  important  role  in  classification  [13].  For 
example,  the  standard  deviation  of  the  distribution  of  data  could  resemble  an  ellipse,  as 
shown  in  Figure  2.9  [13]. 


Figure  2.9:  Ellipse-shaped  distribution  of  data  represented  by  the  standard  deviation  (cr) 


This  method  can  identify  outliers  in  the  data  set  by  accounting  for  the  standard 
deviation  (cr)  of  the  data  [13].  In  Figure  2.9,  the  blue  star  data-point  has  a  greater  euclidean 
distance  from  the  data’s  mean  (jj. )  than  the  red  square;  however,  the  red  square  data  point 
is  outside  the  standard  deviation  (cr)  of  the  data  and  thus  can  be  considered  an  outlier. 
Equation  (2.15)  is  the  general  equation  for  the  Mahalanobis  method  [13]: 

Dm(x)  =  yj(x  -  n)TS~x(x  -  n),  (2.15) 
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where  jc  is  the  multivariate  input  vector,  n  is  a  vector  of  means,  5  is  the  covariance  matrix 
and  Dm(x)  is  the  Mahalanobis  distance  vector  for  x.  The  MATLAB®  documentation  for  the 
classify  function  states  that  Mahalanobis  distances  are  calculated  with  stratified  covariance 
estimates. 

2. 2. 4. 6  Classifier  Terminology 

The  classifier  will  always  classify  an  object  into  one  of  the  classes  defined  by  the 
problem.  In  this  research,  a  position  track  is  always  classified  as  either  suspicious  or  non- 
suspicious.  If  the  classifier  fails  to  classify  the  object  into  the  correct  class,  a  false  positive 
error  (or  Type  I  error)  or  a  false  negative  error  (Type  II  error)  occurs  [14].  These  errors  are 
now  explained  in  the  context  of  this  research.  First,  the  suspicious  class  of  position  tracks 
is  now  referred  to  as  H\  and  the  non-suspicious  class  of  position  tracks  is  referred  as  H0. 
A  false  positive  error  occurs  when  the  classifier  classifies  a  non-suspicious  position  track 
as  suspicious.  In  mathematical  terms,  H0  is  classified  as  A  false  positive  error  is  also 
called  a  ‘false  alarm.’ 

Conversely,  a  false  negative  error  occurs  when  the  classifier  incorrectly  classifies  a 
suspicious  position  track  as  non-suspicious.  In  mathematical  terms,  H i  is  classified  as  H0. 
A  false  negative  error  is  also  called  a  ‘miss.’  The  consequences  of  a  false  negative  error 
are  more  severe  than  a  false  positive  error  because  security  personnel  would  not  engage 
or  investigate  a  suspicious  track  mis-classified  as  non-suspicious.  A  compromise  of  the 
military  installation  could  result. 

The  probability  or  rate  that  a  false  positive  error  or  false  alarm  occurs  is  expressed 
mathematically  as  PF  and  is  defined  as  P[Hi\H0].  This  expression  reads,  “the  probability 
of  H i  given  HqP  It  is  the  probability  that  the  classifier  selects  a  position  track  as  suspicious 
given  that  it  was  non-suspicious.  The  probability  or  rate  that  a  false  negative  error  or  miss 
occurs  is  expressed  mathematically  as  PM  and  is  defined  as  P\H{)\H\  ].  This  expression 
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reads,  “the  probability  of  H0  given  It  is  the  probability  that  the  classifier  selects  a 
position  track  as  non-suspicious  given  that  it  was  suspicious. 

When  a  classifier  correctly  classifies  a  suspicious  track  as  suspicious  it  is  referred  to 
as  a  ‘detect.’  The  probability  of  detection  is  expressed  as  PD  and  is  defined  P\Hx\Hi\.  PD 
is  inversely  proportional  to  PM  and  is  expressed  in  Equation  (2.16). 

Pd=\~  Pm  (2.16) 

When  a  classifier  correctly  classifies  a  non-suspicious  track  as  non-suspicious  it  is  referred 
to  as  a  ‘reject.’  The  probability  of  rejection  is  expressed  as  PR  and  is  defined  P[H0\H0\.  PR 
is  inversely  proportional  to  PF  and  is  expressed  in  Equation  (2.17). 

PR  =  1-PF  (2.17) 

2.2.4.7  Reporting  Methods 

There  are  multiple  methods  to  report  the  results  of  the  classifier.  The  accuracy,  Pm 
and  PF  can  be  graphed  as  a  function  of  a  sweeping  parameter  that  is  incremented  from 
one  numeric  value  to  another.  The  two  parameters  that  were  swept  to  generate  classifier 
statistics  in  this  research  were  the  grid  cell  width  parameter  ( wceu )  and  the  prior  probability 
of  the  suspicious  class  (P[H  i]). 

Receiver  Operating  Characteristic  (ROC)  curves  can  be  generated  which  display  PD 
vs.  PF  data.  ROC  curves  communicate  how  well  the  classifier  correctly  classifies  an  object 
that  is  a  member  of  the  class  H{  into  the  Hi  class,  compared  to  how  often  it  classifies  an 
object  that  is  a  member  of  the  class  H{)  into  the  H\  class.  Chapter  IV  presents  the  classifier 
statistics  for  this  research  through  the  two  parameter  sweeps  discussed  in  the  previous 
paragraph  and  PD  vs.  PF  data  which  was  used  to  generate  best  fit  ROC  curves. 

This  section  presented  the  fundamentals  of  pattern  recognition.  The  four  main  stages 
of  this  field  were  defined  and  applied  in  the  context  of  this  research.  In  the  classification 
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subsection,  the  four  classification  methods  employed  in  this  research  were  introduced, 
and  equations  were  presented  for  them.  Classifier  terminology  that  is  used  in  this  thesis 
was  then  discussed.  Finally,  the  different  ways  of  reporting  the  classifier  statistics  were 
communicated.  The  next  section  will  discuss  the  behavior  classification  process  of  an  RF 
emitter  in  a  WSN. 

2.3  Pattern  Recognition  Applied  to  WSNs 

As  stated  at  the  beginning  of  this  chapter,  one  solution  to  increasing  the  physical 
security  for  a  delineated  area  such  as  a  military  installation  is  to  employ  pattern  recognition 
techniques  in  WSNs  to  detect  anomalous  events.  In  section  1.1,  it  was  explained  that  pattern 
recognition  techniques  can  be  applied  to  geolocated  position  tracks  produced  by  WSNs  to 
determine  if  an  emitter  in  motion  is  malicious  or  not.  This  process  is  referred  to  as  anomaly 
detection  or  intent  assessment.  Anomaly  detection  is  a  subset  of  the  behavior  classification 
field  in  which  patterns  are  classified  into  two  classes:  suspicious  (or  malicious)  and  non- 
suspicious  [15].  Figure  2.10  illustrates  the  flow  from  collection  of  surveillance  target  data 
in  a  sensor  network  to  intent  classification. 
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Environment 


Figure  2.10:  Diagram  of  the  Flow  From  a  Physical  Environment  to  Anomaly  Detection, 
used  with  permission  [15]. 


By  employing  a  geolocation  method  discussed  in  section  2.1  in  real-time,  an  estimated 
position  track  can  be  generated  and  recorded  to  classify  the  behavior.  RF  devices  that 
measure  TOA,  TDOA,  AOA  or  RSS  are  examples  of  sensors  that  could  be  used  in 
this  model.  The  five  features  generated  were  introduced  in  subsection  2.2.3:  dwell  (or 
loiter  time),  repetition,  deviation  from  roads  and  parking  lots,  proximity  to  a  high-valued 
building,  and  proximity  to  a  water  tower. 

The  modeling  algorithms  used  in  this  research  were  the  four  classification  methods 
discussed  in  subsection  2.2.4:  LDF,  DLDF,  QDF  and  the  Mahalanobis  method.  The 
learning  method  used  in  this  research  was  supervised,  because  each  position  track  in  the 
database  was  determined  beforehand,  during  the  data  collection  stage  as  non-suspicious 
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or  suspicious.  Finally,  the  decision  of  normal  or  anomalous  behavior  at  the  end  of  the 
flow  chart  is  equivalent  to  the  output  of  the  intent  assessment  system  in  this  research. 
The  decision  terminology  for  the  two  classes  in  this  research’s  assessment  system  were 
suspicious  and  non-suspicious. 

This  section  presented  a  top-level  view  of  the  flow  from  passive  surveillance  of  an 
RF  emitter  in  a  WSN  to  an  output  decision  that  assessed  the  emitter’s  intent.  The  next 
section  will  discuss  current  research  techniques  in  the  area  of  applying  pattern  recognition 
in  WSNs  to  detect  malicious  activity  in  the  cyber  domain. 

2.4  Pattern  Recognition  Applied  to  Cyber  Security 

Research  into  anomaly  detection  within  a  WSN  is  also  being  conducted  in  the 
computer  network  domain  as  well  as  the  physical  domain.  Pattern  recognition  techniques 
can  be  employed  in  sensor  networks  to  detect  anomalous  computer  network  activity.  This 
activity  could  result  in  a  cyber  or  computer  network-based  attack.  Neural  Networks  (NNs) 
are  one  pattern  recognition  tool  that  can  be  used  to  detect  cyber  attacks  [16].  NNs  are 
composed  of  interconnected  neurons  or  nodes  that  are  used  to  solve  AI  problems. 

One  example  of  a  cyber  attack  is  a  Distributed  Denial  of  Service  (DDoS)  attack  [4]. 
DDoS  attacks  consist  of  a  large  number  of  network  service  requests  towards  a  victim  node. 
A  DDoS  computer  network  attack  can  be  realized  in  a  WSN  by  attacking  a  target  sensor 
with  the  intent  of  exhausting  the  energy  resources  available  to  them.  The  targeted  sensor 
is  then  incapable  of  performing  further  sensing  operations.  Detection  of  DDoS  attacks  in  a 
WSN  can  be  accomplished  using  a  Graph  Neuron  (GN)  NN  [4].  The  algorithm  performs 
comparisons  of  current  network  traffic  patterns  to  a  database  of  normal  sensor  network 
traffic  [4]. 

Electrical  power  grids  and  substations  are  examples  of  infrastructures  that  can  be 
highly  vulnerable  to  cyber  attacks  [16].  Research  has  thus  been  conducted  to  specifically 
prevent  these  kind  of  attacks.  A  cyber  attack  on  a  power  substation  could  have  the  intent 
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of  disrupting  or  denying  the  power  supply  to  commercial  or  residential  areas.  Figure  2.11 
illustrates  a  Probabilistic  Neural  Network  (PNN)  designed  for  detecting  electrical  faults 
caused  by  cyber  attacks  in  a  power  substation.  A  PNN  is  a  type  of  NN  called  a  feedforward 
NN.  In  a  feedfoward  NN,  information  always  flows  forward  through  the  network,  as 
opposed  to  forwards  and  backwards  [16]. 


Figure  2.11:  Architecture  of  a  PNN  Used  for  Detecting  Authentic  Faults  in  Power 
Substations,  used  with  permission  [16]. 
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The  input  vector  X  ([a,  x2  .vyj)  is  called  the  testing  exemplar.  The  testing  exemplar 
elements  x\ ,  x2  and  X3  are  voltage  readings  taken  from  different  locations  in  the  substation. 
PA  and  P u  represent  the  probabilities  of  a  real  fault  and  fake  fault,  respectively.  If 
Pa/ (Pa  +  Pb)  >  0.5,  the  exemplar  data  is  classified  into  class  A,  or  a  real  fault.  If 
Pa! (Pa  +  Pb)  <  0.5,  the  exemplar  data  is  classified  into  class  B  (a  fake  fault).  A  fake 
fault  is  a  possible  indication  that  a  power  substation’s  fault  protection  system  has  been 
compromised  by  a  cyber  attack  [16]. 

In  this  section,  DDoS  attacks  were  defined  and  pattern  recognition  techniques  that 
use  WSNs  to  detect  malicious  cyber  activity  were  introduced.  The  top-level  architecture 
of  a  PNN  designed  to  classify  power  substation  data  into  real  and  fake  faults  was  briefly 
covered. 

2.5  Chapter  Summary 

This  chapter  outlined  principles  of  both  source  localization  and  pattern  recognition. 
Source  localization  was  defined  and  the  principles  of  TOA,  TDOA  and  AOA  were  covered 
briefly.  Geolocation  using  RSS  was  discussed  in  greater  and  detail  and  the  results  of  RSS 
simulations  of  five  emitters  in  three  different  sensor  configurations  using  an  MLE  were 
presented. 

Pattern  recognition  was  then  defined  and  the  four  main  stages  of  a  pattern  recognition 
system  were  described  in  detail.  Each  stage  was  explained  in  the  context  of  this  research.  In 
the  classification  stage,  equations  were  provided  for  the  four  classification  methods  used  in 
this  research,  with  an  emphasis  on  the  LDF  derivation.  Additionally,  classifier  terminology 
was  introduced  and  methods  of  reporting  classifier  statistics  were  covered. 

A  brief  overview  was  presented  in  the  area  of  employing  pattern  recognition 
techniques  in  a  WSN  to  detect  anomalous  behavior  in  the  physical  domain.  Finally,  current 
research  that  employs  pattern  recognition  in  WSNs  to  identify  computer  network  (or  cyber) 
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attacks  was  discussed.  Chapter  III  applies  the  theory  that  was  presented  in  chapter  II 
towards  the  methodologies  used  in  this  research. 
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III.  Research  Methodology 


This  chapter  describes  the  methodologies  performed  to  collect  and  process  x-y 
position  tracks  using  the  Magellan®  Mobilemapper  GPS  unit,  generate  features 
from  the  collected  position  tracks  and  process  the  generated  feature  data  to 
perform  intent  classification  on  a  single  position  track  in  MATLAB®.  A  database  was 
created  of  position  tracks  collected  from  the  GPS  unit  which  was  used  by  the  classifier. 
This  database  was  comprised  of  position  tracks  intentionally  created  to  replicate  suspicious 
and  non-suspicious  activity  on  a  military  installation.  Figure  3.1  presents  the  specific  GPS 
unit  used  to  collect  the  position  tracks  in  this  research. 
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Figure  3.1:  Magellan  ®  Mobile  GPS  Unit  in  Autonomous  GPS  Mode. 


3.1  Magellan®  GPS  Unit  Operation 

The  Magellan®  GPS  device  user  interface  was  straightforward  and  simple  to  operate. 
When  the  unit  was  acquiring  and  maintaining  communication  with  the  GPS  satellites, 
it  occasionally  alternated  between  the  autonomous  and  differential  GPS  modes.  This 
fluctuation  was  due  to  the  variability  of  the  GPS  constellation’s  signal  strength  in  different 
conditions.  The  GPS  unit’s  log  functionality  enabled  real-time  acquisition  of  Lat-Long 
coordinates  which  were  stored  over  time  to  create  a  position  track.  Each  position  track 
recorded  by  the  GPS  unit  was  collected  within  the  physical  area  displayed  by  Figure  3.2. 
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Figure  3.2:  Overhead  Image  of  Position  Track  Collection  Area. 


This  image  is  a  section  of  Area  B,  WPAFB,  and  was  captured  using  Google  Maps®. 
The  dimensions  of  this  image  were  776  x  776  pixels.  Google  Maps®  was  also  used  to 
convert  the  x  and  y  axes  from  pixel  units  to  feet.  To  accomplish  this,  two  points  at  about 
the  same  pixel  row  were  picked  on  the  overhead  image  plotted  in  MATLAB®.  MATLAB® 
was  then  used  to  zoom  in  on  the  two  points  and  the  distance  between  the  two  points  in 
pixels  was  counted  and  recorded.  The  distance  scale  in  Google  Maps®  was  then  used  to 
measure  the  distance  in  feet  from  the  same  two  points. 

It  was  determined  through  dimensional  analysis  that  one  pixel  had  a  distance  of  2.7 
feet.  Since  the  dimensions  of  the  overhead  image  were  776  x  776  pixels,  the  dimensions  of 
the  image  in  feet  were  (776  *  2.7)  x  (776  *  2.7)  or  2,095  feet  x  2,095  feet.  The  imagesc 
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command  in  MATLAB®  was  used  to  correctly  scale  the  x  and  y  axes.  The  next  section 
describes  how  the  position  tracks  were  imported  into  MATLAB®  so  feature  generation 
could  be  performed  on  them. 

3.2  Position  Track  Processing 

This  section  describes  the  process  of  importing  the  position  tracks  from  the  GPS  unit 
into  MATLAB®  and  converting  the  Lat-Long  coordinates  into  x-y  coordinates  for  feature 
generation. 

3.2.1  Importing  GPS  Data  into  MATLAB® 

Each  position  track  text  file  was  ported  into  a  personal  computer  workstation  through 
a  Universal  Serial  Bus  (USB)  connection.  A  function  extracted  the  Lat-Long  data  from 
the  National  Marine  Electronics  Association  (NMEA)  data  format  that  was  stored  by  the 
GPS  unit.  The  Lat-Long  data  was  stored  in  a  L  x  2  matrix  where  L  was  the  length  or  total 
number  of  time-stamps  of  a  particular  track.  The  two  columns  encompassed  the  latitude 
and  longitudinal  numeric  values,  respectively.  A  third  column  was  added  to  the  matrix  that 
stored  the  time- stamp  for  each  Lat-Long  entry.  This  column  started  with  a  numeric  value 
of  one  and  was  incremented  by  one  to  L  (the  last  set  of  Lat-Long  coordinates  of  a  position 
track  had  a  time- stamp  of  L). 

3.2.2  Lat-Long  to  x-y  Coordinate  Conversion 

It  was  then  necessary  to  convert  the  Lx  3  time-stamp-Lat-Long  matrix  for  each 
position  track  to  a  matrix  of  x-y  coordinates  corresponding  to  the  locations  on  the  overhead 
imagery  map  in  Ligure  3.2.  A  function  converted  each  Lat-Long  time- stamp  to  an  x-y 
coordinate  pair.  It  was  necessary  to  determine  the  Lat-Long  coordinate  pair  of  the  bottom- 
left  comer  of  the  overhead  image  acquired  from  Google  Maps®.  The  origin  of  the  overhead 
image  was  at  the  bottom- left  corner  and  had  the  x-y  coordinates  (0,0).  The  newly  created 
time-stamp-coordinate  matrices  also  had  the  dimensions  Lx  3  for  each  position  track.  The 
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first  column  of  the  matrix  encompassed  the  individual  time-stamp.  The  second  and  third 
columns  stored  the  x  and  y  coordinate  values  respectively. 

Figure  3.3  illustrates  a  processed  position  track  that  was  collected  using  the  GPS  unit. 
The  track  is  green  because  it  was  collected  with  the  intent  of  modeling  non-suspicious 
behavior. 
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Figure  3.3:  An  Example  Position  Track  Collected  with  the  Magellan®  GPS  Unit. 
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The  22  position  tracks  in  the  position  track  database  were  all  processed  before  feature 
generation  on  them  was  performed.  The  next  section  explains  the  five  feature  generation 
algorithms  that  were  performed  on  the  position  tracks. 

3.3  Feature  Generation 

This  section  describes  the  processes  employed  to  generate  feature  data  from  the 
position  track  database.  The  five  features  used  in  this  research  were  first  introduced  in 
subsection  2.2.3  and  are  listed  again:  dwell  (or  loiter  time),  repetition,  deviation  from  roads 
and  parking  lots,  proximity  to  a  high-valued  building,  and  proximity  to  a  water-tower.  The 
dwell  and  repetition  feature  algorithms  required  the  use  of  a  variable  resolution  grid  which 
is  discussed  in  subsection  3.3.1.  The  remaining  three  features  belonged  to  a  category  of 
features  called  landmark  features.  The  algorithms  for  these  features  required  a  unique 
landmark  distance  map.  The  development  of  the  landmark  distance  maps  is  explained  in 
subsection  3.3.4. 

3.3. 1  Development  of  Grid 

It  was  necessary  to  create  a  2-dimensional  grid  in  MATLAB®  to  generate  data  for  the 
dwell  and  repetition  features.  The  grid  itself  was  square  in  shape,  and  partitioned  the  776 
x  776  pixel  overhead  image  into  square-shaped  cells  of  equal  area.  The  resolution  of  the 
grid  could  be  adjusted,  which  altered  the  total  number  of  cells  within  the  grid.  The  grid 
was  used  to  keep  track  of  which  cell  a  position  track  resided  in  for  a  given  time  stamp.  A 
significant  amount  of  book-keeping  was  required  for  this.  Once  the  dwell  and  repetition 
feature  algorithms  were  able  to  determine  which  cell  a  position  track  was  in,  code  could  be 
written  to  determine  when  a  position  track  left  the  current  cell  and  when  (if  ever)  it  returned 
to  the  current  cell  and  how  many  times.  Figure  3.4  illustrates  a  grid  configuration  with  a 
grid  cell  width  wceu  of  81  feet. 
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Figure  3.4:  Variable  Resolution  Grid  Developed  for  the  Dwell  and  Repetition  Features: 
wCeu  =  81  feet. 


Subsections  3.3.2  and  3.3.3  will  discuss  how  the  grid  designed  for  this  research  was 
used  to  generate  feature  data  for  the  dwell  and  repetition  features. 

3.3.2  Dwell 

The  first  feature  used  to  classify  position  tracks  as  suspicious  or  non-suspicious  was 
the  dwell  time  (or  loiter)  feature.  A  numeric  score  for  this  feature  was  determined  by 
counting  how  many  time  stamps  a  certain  position  track  remained  in  the  same  grid  cell. 
This  was  accomplished  by  traversing  through  position  track  starting  at  time-stamp  one.  The 
x  and  y  coordinates  for  each  time-stamp  were  accessed  and  used  to  determine  which  exact 
cell  the  position  track  was  currently  residing  in.  A  local  MATLAB®  variable  incremented 
every  time  a  position  track  remained  in  the  same  cell,  as  the  next  time-stamp  was  accessed. 
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A  segment  of  MATLAB®  code  is  included  to  illustrate  how  the  dwell  algorithm  was 
implemented. 

max_count  =  0; 

x_est  =  track(i,2);  y_est  =  tracks (i, 3); 

%  determine  which  cell  the  estimate  is  in 
cell_x  =  ceil(x_est/grid_spacing) ; 
cell_y  =  ceil(y_est/grid_spacing) ; 
prev_x  =  cell_x;  prev_y  =  cell_y; 
for  i  =  2 : length(track) 

x_est  =  track(i,2);  y_est  =  tracks(i,3); 

%  determine  which  cell  the  estimate  is  in 
cell_x  =  ceil (x_est/grid_spacing) ; 
cell_y  =  ceil (y_est/grid_spacing) ; 
if  cell_x  ==  prev_x  &  cell_y  ==  prev_y 
%  track  stayed  in  the  same  cell 
count  =  count  +  1 ; 
if  count  >  max_count 
max_count  =  count ; 

end 

else  %  track  moved  to  new  cell 
count  =  0; 

prev_x  =  cell_x;  prev_y  =  cell_y; 

end 

end 
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Figure  3.5  illustrates  a  collected  position  track  that  generated  a  low  dwell  feature  score. 
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Figure  3.5:  Position  Track  with  Low  Dwell  Time. 


Figure  3.6  illustrates  a  collected  position  track  that  generated  a  high  dwell  feature 
score.  The  track  is  red  because  it  was  collected  with  the  intent  of  modeling  suspicious 
activity. 


39 


2000 

1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 

0 

0 


500  1000  1500  2000 

x  (feet) 


Figure  3.6:  Position  Track  with  High  Dwell  Time. 


This  subsection  described  how  the  dwell  time  feature  data  was  generated  for  the  22 
position  tracks.  Procedural  code  was  inserted  to  illustrate  specifically  how  the  algorithm 
operated.  Additionally,  two  position  tracks  were  shown  that  had  low  and  high  dwell  time 
scores.  The  next  subsection  describes  the  development  of  the  repetition  feature  algorithm. 

3.3.3  Repetition 

The  repetition  feature  was  employed  to  determine  if  a  position  track  exhibited 
repetitive  activity  within  the  data  collect  area.  In  the  context  of  this  research,  repetition 
was  defined  as  the  number  of  times  a  position  track  returned  to  the  same  grid  cell.  The 
expression  used  to  generate  a  score  for  repetitive  position  track  behavior  was: 

f>;,  (3.D 

C—  1 
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where  N  was  the  total  number  of  grid  cells  in  the  current  resolution  configuration,  c  was 
the  current  grid  cell  and  nc  was  the  number  of  times  that  a  particular  position  track  returned 
to  the  current  grid  cell  c  during  its  collect.  The  exponent  was  a  penalty  factor  and  was  set 
to  two  for  this  research.  Figure  3.7  illustrates  a  collected  position  track  that  exhibited  low 
repetitive  activity. 
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Figure  3.7:  Position  Track  with  Low  Repetition. 

Figure  3.8  illustrates  a  collected  position  track  that  exhibited  high  repetition. 
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Figure  3.8:  Position  Track  with  High  Repetition. 


This  subsection  described  how  repetition  feature  data  was  generated  for  the  22  position 
tracks.  Equation  (3.1)  expressed  the  formula  to  calculate  a  repetition  score  for  each  position 
track.  Additionally,  two  position  tracks  were  presented  that  had  low  and  high  repetition 
scores.  The  next  subsection  describes  the  development  of  the  landmark  distance  maps  for 
the  three  landmark  features. 

3.3.4  Development  of  Landmark  Distance  Maps 

As  stated  in  the  beginning  of  this  section,  it  was  necessary  to  develop  distance  maps 
in  MATLAB®  for  the  three  landmark  features  used  in  this  research  effort.  This  was 
accomplished  by  first  creating  a  bitmap  image  that  highlighted  the  areas  of  interest  for  each 
landmark.  The  dimensions  of  the  bitmap  image  were  the  same  as  the  data  collection  area 
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overhead  image  presented  in  Figure  3.2:  776  x  776  pixels.  For  the  deviation  from  roads  and 
parking  lots  feature,  pixels  that  encompassed  either  a  road  or  parking  lot  were  highlighted. 
For  the  proximity  to  high-valued  building  and  water  tower  features,  the  bitmaps  highlighted 
the  pixels  that  encompassed  these  landmarks. 

For  the  high-valued  building  and  water  tower  bitmaps,  it  was  relatively  simple  to 
manually  highlight  the  individual  pixels  that  comprised  those  landmarks.  However,  the 
task  of  highlighting  all  the  areas  in  the  overhead  image  that  occupied  road  or  parking  lot 
occupied  proved  to  be  a  formidable  task.  The  most  efficient  way  this  was  accomplished 
was  by  using  MATLAB®  to  search  the  original  overhead  image  for  pixels  that  had  the 
same  shading  as  pixels  that  were  comprised  of  a  road  or  parking  lot.  The  deviation  from 
roads  and  parking  lots  bitmap  was  created  this  way.  The  three  bitmaps  created  all  had 
dimensions  of776x776  pixels. 

After  the  three  bitmaps  were  created,  the  distance  maps  for  the  three  landmark  features 
were  created  by  a  function  in  MATLAB®.  This  function  inputted  a  776  x  776  pixel 
bitmap  and  returned  a  776  x  776  pixel  image  for  the  given  landmark  feature.  The  function 
searched  the  bitmap  for  the  pixels  that  had  values  of  zero.  These  pixels  corresponded  to 
the  highlighted  areas  of  a  given  landmark  feature.  Then,  pixels  adjacent  to  the  zero-valued 
pixels  were  assigned  a  value  of  one  because  they  were  one  pixel  (or  one  unit  of  distance) 
away  from  the  landmark  feature  area. 

Then,  non-zero- valued  pixels  that  were  adjacent  to  the  pixels  that  had  values  of  one 
were  set  to  two  because  they  were  two  pixels  (or  two  units  of  distance)  away  from  the 
landmark  feature  area.  This  process  continued  until  every  pixel  in  the  776  x  776  map  that 
was  not  part  of  the  highlighted  area  (and  originally  set  to  zero)  was  assigned  a  non-zero 
value.  In  this  way,  a  sophisticated  landmark  distance  map  was  created  for  each  of  the  three 
landmark  features  used  in  this  research. 
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In  the  case  of  the  roads  and  parking  lots  distance  map,  if  a  pixel  was  200  units  away 
from  a  certain  parking  lot  but  only  100  units  away  from  a  road,  the  numeric  value  of  100 
was  stored  to  the  pixel.  The  function  that  created  the  landmark  distance  maps  ensured  that 
any  pixel  in  the  map  that  was  not  a  member  of  the  highlighted  area  was  set  to  the  minimum 
distance  away  (or  closest  distance  to)  the  nearest  pixel  belonging  to  the  highlighted  area. 

This  subsection  explained  how  the  landmark  distance  maps  were  created  in  MATLAB®. 
The  next  three  subsections  describe  the  processes  used  to  generate  feature  data  using  the 
three  landmark  features.  Figures  of  each  landmark  distance  map  that  were  used  in  this 
research  are  presented. 

3.3.5  Deviation  from  Roads  and  Parking  Lots 

This  feature  reported  the  maximum  value  that  a  position  track  deviated  from  either  a 
road  or  a  parking  lot  within  the  confines  of  the  overhead  image.  A  local  variable  called 
deviation  was  initialized  to  0.  The  position  track  was  then  traversed,  accessing  the 
x  and  y  coordinates  for  each  time-stamp.  Each  set  of  coordinates  were  mapped  to  the 
corresponding  pixel  in  the  landmark  pixel  distance  map  that  the  position  track  resided  in 
at  the  particular  time-stamp.  Then,  the  pixel  value  for  that  pixel  was  accessed  from  the 
distance  map.  If  the  pixel  value  was  greater  than  the  variable  deviation,  deviation  was 
set  to  the  current  pixel  value.  After  the  entire  position  track  was  traversed,  the  deviation 
variable  held  the  maximum  deviation  for  that  position  track. 

After  deviation  values  for  all  position  tracks  were  determined,  the  feature  found  the 
greatest  of  the  22  maximum  deviation  values.  All  22  maximum  deviation  values  were 
divided  by  this  value  to  normalize  the  deviation  scores  for  all  position  tracks  between  zero 
and  one.  A  segment  of  MATLAB®  code  is  included  to  illustrate  how  this  feature  algorithm 
was  implemented. 
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deviation=  0; 

for  i  =  1 : length(track) 
x  =  floor (track(i , 2)) ; 
y  =  floor (track(i , 3)) ; 
current_value  =  roadParkingMapCy , x) ; 

%  x  and  y  reversed  flipped  because  of  matrix  form 
if  current_value  >  deviation 
deviation  =  current_value ; 

end 

end 

Figure  3.9  presents  the  landmark  distance  map  for  this  landmark  feature. 
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Figure  3.9:  Deviation  from  Roads  and  Parking  Lots  Distance  Map. 


This  figure  shows  the  areas  of  the  map  that  were  roads  and  parking  lots  (dark  blue- 
colored  areas),  and  also  the  areas  that  were  not  roads  and  parking  lots.  The  pixels  that  have 
significant  distance  from  a  road  or  parking  lot  were  colored  appropriately.  Each  pixel’s 
value  in  the  map  represented  the  distance  to  the  closest  parking  lot  or  road. 

This  subsection  described  how  feature  data  from  the  roads  and  parking  lots  distance 
map  was  generated  for  the  22  position  tracks.  The  maximum  deviation  from  a  road  or 
parking  lot  within  the  confines  of  the  overhead  image  was  calculated  for  each  position  track 
Each  deviation  value  was  divided  by  the  maximum  deviation  of  any  track  in  the  database  to 
scale  all  the  deviation  values  from  zero  to  one.  The  next  subsection  describes  the  process 
used  to  generate  feature  data  from  the  proximity  to  high-valued  building  landmark  feature. 
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3.3.6  Proximity  to  High-valued  Building 

This  feature  reported  the  nearest  proximity  of  a  position  track  to  a  particular  building 
in  the  overhead  image.  This  building  was  located  just  north  of  the  AFIT  east  parking  lot  at 
the  Lat-Long  coordinates  [39.783944N,  84.081043W].  Four  satellite  dishes  were  enclosed 
behind  a  fence  at  the  south-east  corner  of  the  building.  The  building  was  selected  as  a 
high- valued  building  for  the  purpose  of  this  research. 

A  local  variable  called  maximumProximity  was  initialized  to  60.  The  position  track 
was  then  traversed,  accessing  the  x  and  y  coordinates  for  each  time-stamp.  Each  set  of 
coordinates  were  mapped  to  the  corresponding  pixel  in  the  landmark  pixel  distance  map 
that  the  position  track  resided  in  at  that  particular  time-stamp.  Then,  the  pixel  value 
for  that  pixel  was  accessed  from  the  distance  map.  If  the  pixel  value  was  less  than  the 
proximity  variable,  proximity  was  set  to  the  current  pixel  value.  After  the  entire  position 
track  was  traversed,  the  proximity  variable  held  the  closest  proximity  value  to  the  high¬ 
valued  building  for  that  position  track. 

After  proximity  values  for  all  position  tracks  were  determined,  the  algorithm  found 
the  greatest  of  the  22  maximum  deviation  values.  All  22  maximum  deviation  values  were 
divided  by  this  value  so  that  the  now-scaled  values  were  all  real  numbers  between  zero  and 
one.  A  segment  of  MATLAB®  code  is  included  to  illustrate  how  this  feature  algorithm  was 
implemented. 
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maximumProximity  =  60; 

for  i  =  1 : length(track) 
x  =  floor (track(i , 2)) ; 
y  =  floor (track(i , 3)) ; 

current_value  =  HighValuedBuildingMapCy , x) ; 

%  x  and  y  reversed  because  of  matrix  form 
if  current_value  <  maximumProximity 
proximity  =  current_value ; 

end 

end 

Figure  3.10  presents  the  landmark  distance  map  for  this  landmark  feature. 
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Figure  3.10:  Proximity  to  High-valued  Building  Distance  Map. 

This  figure  shows  the  area  of  the  map  where  the  high-valued  building  was  located. 
The  maximum  value  that  any  pixel  in  the  distance  map  contained  was  60  feet.  This 
subsection  described  how  feature  data  was  generated  for  the  proximity  to  a  high-valued 
building  landmark  feature.  A  score  was  calculated  for  each  position  track  of  the  closest 
proximity  to  the  high-valued  building.  The  next  subsection  describes  the  process  used  to 
generate  feature  data  from  the  proximity  to  the  water  tower  landmark  feature. 

3.3. 7  Proximity  to  Water  Tower 

This  landmark  feature  reported  the  proximity  to  the  water  tower  on  Area  B,  with  the 
Lat-Long  coordinates:  [39.7845 17N,  84.080928W].  This  water  tower  was  considered  high- 
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valued  for  the  purpose  of  this  research.  Any  position  track  that  came  within  close  proximity 
to  it  was  flagged  as  suspicious. 

A  local  variable  called  maximumProximity  was  initialized  to  60.  The  position  track 
was  then  traversed,  accessing  the  x  and  y  coordinates  for  each  time-stamp.  Each  set  of 
coordinates  were  mapped  to  the  corresponding  pixel  in  the  landmark  pixel  distance  map 
that  the  position  track  resided  in  at  that  particular  time-stamp.  Then,  the  pixel  value  for  that 
pixel  was  accessed  from  the  distance  map.  If  the  pixel  value  was  less  than  the  proximity 
variable,  proximity  was  set  to  the  current  pixel  value.  After  the  entire  position  track  was 
traversed,  the  proximity  variable  held  the  nearest  proximity  value  to  the  water  tower  for 
that  position  track.  A  segment  of  MATLAB®  code  is  included  to  illustrate  how  this  feature 
algorithm  was  implemented. 

maximumProximity  =  60; 

for  i  =  1 : length(track) 
x  =  floor (trackfi , 2)) ; 
y  =  floor (trackfi , 3)) ; 
current_value  =  waterTowerMapfy ,x) ; 

%  x  and  y  reversed  because  of  matrix  form 
if  current_value  <  maximumProximity 
proximity  =  current_value ; 

end 

end 

Figure  3.1 1  presents  the  landmark  distance  map  for  this  landmark  feature. 
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Figure  3.11:  Proximity  to  Water  Tower  Distance  Map. 


This  figure  shows  the  area  of  the  map  where  the  water  tower  was  located.  The 
maximum  value  that  any  pixel  in  the  distance  map  contained  was  60  feet.  This  subsection 
described  how  feature  data  was  generated  for  the  proximity  to  water  tower  feature.  A  score 
was  calculated  for  each  position  track  of  the  closest  proximity  to  the  water  tower. 

This  section  on  described  the  processes  employed  to  generate  feature  data  from 
the  position  track  database.  The  first  subsection  described  the  development  of  the  grid 
necessary  to  calculate  data  using  the  dwell  and  repetition  features.  Then,  the  dwell  and 
repetition  feature  algorithms  were  explained.  The  development  of  the  landmark  distance 
maps  was  then  discussed.  Finally,  the  three  landmark  features  were  described.  The  next 
section  describes  how  each  position  track  was  classified  using  the  LOOCV  method. 


51 


3.4  Classification  of  Position  Tracks 


This  section  details  the  process  of  individually  classifying  each  collected  position  track  in 
the  position  track  database  using  the  LOOCV  method.  Each  position  track  was  classified 
using  the  four  classification  methods  introduced  in  chapter  II.  Before  classification  could 
be  performed,  the  feature  generation  data  had  to  be  converted  into  three  matrices  for  the 
classify  function. 

3.4.1  Pre-classification  Data  Processing 

Before  a  position  track  could  be  classified  as  suspicious  or  non-suspicious,  the 
generated  feature  data  needed  to  be  processed  into  a  format  acceptable  for  the  classify 
function.  First,  the  generated  feature  data  was  consolidated  into  a  single  matrix  in 
MATLAB®.  This  matrix  had  the  dimensions  of  22  x  5  (22  tracks  and  five  features).  Next, 
this  matrix  was  segregated  into  two  matrices  for  each  class  (suspicious  and  non-suspicious). 
The  suspicious  class  matrix  had  the  dimensions  of  12  x  6  while  the  non-suspicious  class 
matrix  had  dimensions  of  10  x  6.  The  additional  column  in  these  matrices  stored  the  track 
number  of  each  position  track  for  ease  of  error  checking. 

The  two  matrices  grouped  by  class  were  then  used  to  create  the  sample,  training  set 
and  group  matrices  required  by  MATLAB®’s  classify  function.  The  sample  matrix 
consisted  of  the  particular  position  track  to  be  classified  as  suspicious  or  non-suspicious 
by  the  classify  function.  It  was  essentially  a  vector  that  consisted  of  the  feature  data 
generated  for  the  track  to  be  classified.  For  this  research,  the  dimension  of  the  sample 
matrix  was  always  1x5.  While  the  classify  function  was  able  to  classify  more  than 
one  position  track  simulatenously,  position  tracks  were  always  classified  one  a  time  in  this 
research  because  the  FOOCV  method  was  employed. 

The  training  data  matrix  consisted  of  the  generated  feature  data  for  the  other  21  tracks 
that  were  not  being  classified.  In  this  research,  the  dimensions  of  this  matrix  were  always 
21  x  5.  The  group  matrix  always  had  dimensions  of  21  x  1  and  was  essentially  a  grouping 
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vector  for  the  training  data  matrix.  It  instructed  the  classify  function  which  position  tracks 
in  the  training  data  matrix  were  pre-determined  to  be  suspicious  or  non-suspicious. 

The  value  in  a  particular  row  of  the  group  matrix  indicated  the  class  of  the  track 
corresponding  to  the  same  row  in  the  training  data  matrix.  If  a  certain  position  track  was 
suspicious,  a  zero  was  assigned  to  that  track’s  corresponding  row  in  the  group  matrix.  If 
a  position  track  was  designed  to  be  non-suspicious,  the  value  in  the  group  vector  was  set 
to  one.  For  example,  if  a  certain  track  in  the  fourth  row  of  the  training  data  matrix  was 
designed  to  be  a  suspicious  track,  the  fourth  row  of  the  group  matrix  was  set  to  zero. 

Code  was  written  in  MATLAB®  to  consolidate  the  generated  feature  data  for  the  22 
position  tracks  into  one  matrix,  create  the  two  matrices  separated  by  class  and  finally 
populate  the  sample,  training  data  and  group  matrices  required  for  the  classify  function. 
Figure  3.12  illustrates  example  sample,  training  data  and  group  matrices  used  in  this 
research.  The  first  value  in  the  sample  matrix  indicates  track  nine  was  the  position  track 
that  was  classified.  The  first  column  in  the  training  data  matrix  displays  the  position  track 
numbers.  This  column  was  inserted  for  error  checking  and  was  not  passed  into  the  classify 
function.  Columns  two  through  six  in  this  matrix  display  the  generated  feature  data  for  the 
position  tracks  in  the  following  order:  dwell,  repetition,  deviation  from  roads  and  parking 
lots,  proximity  to  the  high- valued  building  and  proximity  to  the  water  tower. 
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Command  Window 
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»  training 
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0 
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0.2624 
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0.5327 

0.6205 

0.6419 

5.0000 

0.4901 

0 

0.8560 

0.4093 

0.3161 

6.0000 

0.0662 

0.0263 

0.2762 

0.5924 

0.6154 

7.0000 

0.2053 

0 

0.8560 

0.5047 

0.3946 

11.0000 

0.0795 

0 

0.8249 

0.5858 

0.6082 

13.0000 

0.3841 

0 

0.8699 

0.6734 

0.5721 

14.0000 

0.3444 

0.0263 

0.4358 

0.6336 

0.6522 

16.0000 

0.5166 

0 

0.8767 

0.5341 
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18.0000 
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0 

0.8767 

0.5379 

0.4440 
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0.3709 

0.0263 
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through  15 
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0  0 

0  0 
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through  21 
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fx  » 


Figure  3.12:  Example  of  Sample,  Training  Data  and  Group  Matrices  in  MATLAB®. 


After  the  sample,  training  set  and  group  matrices  were  generated  in  MATLAB®, 
classification  of  a  single  position  track  using  the  four  different  classification  methods  could 
then  be  accomplished. 

3.4.2  Classification  in  MATLAB® 

Position  tracks  were  classified  as  suspicious  or  non-suspicious  with  MATLAB®’s 
classify  function  using  the  sample,  training  set  and  group  matrices  described  in 
subsection  3.4.1.  Each  position  track  in  the  database  was  classified  four  times  using  the 
LOOCV  method  by  the  four  classification  methods  discussed  in  chapter  II.  The  classify 


54 


function  always  returned  either  a  zero  or  one  for  each  of  the  four  classification  methods 
since  the  intent  classifier  in  this  research  was  a  two  class  scenario. 

If  a  zero  was  returned,  the  position  track  in  the  sample  matrix  was  classified  as 
suspicious.  Conversely,  the  position  track  in  the  sample  matrix  was  classified  as  non- 
suspicious  if  a  one  was  returned.  After  a  position  track  was  classified  as  suspicious  or 
non-suspicious,  the  results  were  stored  for  statistical  processing  of  the  four  classification 
methods.  Chapter  IV  will  discuss  the  results  of  the  four  classification  methods. 

This  section  explained  the  process  performed  to  individually  classify  each  collection 
position  track  in  the  position  track  database  using  the  LOOCV  method.  Each  position 
track  was  classified  using  the  four  classification  methods  introduced  in  chapter  II. 
Subsection  3.4.1  explained  how  the  generated  feature  data  was  organized  into  the  sample, 
training  data  and  group  matrices  needed  for  classification.  Subsection  3.4.2  communicated 
how  classification  of  the  position  tracks  was  performed  in  MATLAB®. 

3.5  Chapter  Summary 

This  chapter  described  the  processes  and  methodologies  that  were  performed  to  collect 
and  process  x-y  position  tracks  using  the  Magellan®  Mobilemapper  GPS  unit,  generate 
features  from  the  collected  position  tracks  and  then  process  the  generated  feature  data  to 
perform  intent  classification  on  a  single  position  track  in  MATLAB®. 

Chapter  IV  will  discuss  the  feature  generation  and  classification  results  produced  in 
this  research  using  the  methodologies  that  were  described  in  this  chapter. 
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IV.  Results  and  Analysis 


This  chapter  discusses  the  feature  generation  and  classification  results  produced 
in  this  research  using  the  methodologies  described  in  chapter  III.  A  detailed 
analysis  of  these  results  is  performed.  Each  position  track  in  the  database 
was  individually  classified  using  the  LOOCV  method  by  the  four  classification  methods 
discussed  in  chapter  II.  Cross-sectional  feature  generation  data  plots  that  include 
discriminant  lines  and  curves  from  the  four  classification  methods  are  presented  and 
discussed  in  section  4.1.  The  classifier  statistics  and  results  are  presented  and  analyzed 
in  section  4.2. 

4.1  Feature  Generation  Results 

Cross-sectional  feature  generation  figures  are  presented  and  explained  in  this  section. 
Figure  4.1  shows  a  cross-sectional  plot  of  the  dwell  time  vs.  repetition  feature  generated 
feature  data.  162  foot  grid  cell  spacing  was  chosen  for  this  figure  because  it  allowed  for  the 
greatest  separation  between  suspicious  and  non-suspicious  tracks.  The  four  discriminant 
function  lines  and  curves  from  the  four  classification  methods  discussed  in  chapter  II  are 
also  shown. 
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0.4  0.5  0.6 

Repetition  Feature  Data 


Figure  4.1:  Dwell  Time  vs.  Repetition:  wceu  =162  feet,  Classified  Track  =  9. 


The  red  upside-down  triangle  was  the  position  track  that  was  classified.  Track  nine  was 
the  position  track  that  was  classified  for  all  of  the  two  dimensional  cross-sectional  plots  in 
this  section.  The  green  circles  and  red  x’s  represent  the  non-suspicious  and  suspicious 
position  tracks  used  as  training  data  for  the  classifier,  respectively.  The  discriminant 
function  lines  and  curves  display  the  decision  boundary  areas  for  each  classification 
method.  A  boundary  area  labeled  as  H0  indicates  the  area  that  the  particular  classification 
method  would  classify  a  position  track  as  non-suspicious.  Likewise,  a  boundary  area 
labeled  as  Hi  indicates  the  area  that  the  particular  classification  method  would  classify 
a  position  track  as  suspicious. 
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The  position  and  shape  of  the  discriminant  function  lines  and  curves  depended  on 
the  track  that  was  classified  as  suspicious  or  non-suspicious  because  the  training  data  used 
as  an  input  into  the  classifier  would  change.  Feature  data  from  only  the  dwell  time  and 
repetition  features  were  used  by  the  classifier  in  Figure  4.1.  All  four  methods  correctly 
classified  track  nine  as  suspicious.  P[Hq]  and  P[H i]  were  both  set  to  0.5  in  MATLAB® 
when  the  classify  function  was  executed.  All  position  tracks  that  had  a  repetition  score 
of  about  0.1  and  greater  were  suspicious,  while  some  non-suspicious  position  tracks  had  a 
dwell  time  score  as  high  as  0.5. 
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Figure  4.2:  Deviation  from  Roads  and  Parking  Lots  vs.  Repetition:  wcen  =162  feet, 
Classified  Track  =  9. 


Figure  4.2  shows  a  cross-sectional  plot  of  the  deviation  from  roads  and  parking  lots 
vs.  repetition  generated  feature  data.  Again,  the  red  upside-down  triangle  represented 
track  nine  and  was  the  position  track  that  was  classified.  Feature  data  from  only  the 
deviation  from  roads  and  parking  lots  and  repetition  features  were  used  by  the  classifier. 
All  four  classification  methods  correctly  classified  track  nine  as  suspicious.  P[H0]  and 
P[H\\  were  both  set  to  0.5  in  MATLAB®  when  the  classify  function  was  executed.  162 
foot  grid  cell  spacing  was  again  chosen  for  this  figure  because  it  allowed  for  the  greatest 
separation  between  suspicious  and  non-suspicious  tracks.  While  all  position  tracks  that  had 
a  repetition  score  of  about  0.1  and  greater  were  suspicious,  some  non-suspicious  position 
tracks  had  a  deviation  from  roads  and  parking  lots  score  as  high  as  0.9. 
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Figure  4.3:  Proximity  to  Water  Tower  vs.  Dwell  Time:  wceu  =  162  feet,  Classified  Track  = 
9. 


Figure  4.3  shows  a  cross-sectional  plot  of  the  proximity  to  the  water  tower  vs.  dwell 
time  generated  feature  data.  Track  nine  was  the  position  track  that  was  classified.  Feature 
data  from  only  the  dwell  time  and  proximity  to  water  tower  were  used  by  the  classifier.  The 
LDF,  DLDF  and  Mahalanobis  methods  correctly  classified  track  nine  as  suspicious,  while 
the  QDF  method  incorrectly  classified  it  as  non-suspicious. 

One  possible  reason  why  the  QDF  curve  was  pulled  to  the  right  in  this  figure  was 
because  track  16  had  a  dwell  score  of  0.5  but  was  collected  with  the  intent  of  being  non- 
suspicious.  If  this  position  track  were  not  included  in  the  training  data,  the  QDF  method 
may  have  correctly  classified  track  nine  as  suspicious.  162  foot  grid  cell  spacing  was  again 
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chosen  for  this  figure  because  it  allowed  for  the  greatest  separation  between  suspicious  and 
non-suspicious  tracks. 

Figure  4.4  shows  a  cross-sectional  plot  of  the  proximity  to  the  high- valued  building 
vs.  repetition  generated  feature  data.  Track  nine  was  the  position  track  that  was  classified. 
Feature  data  from  only  the  repetition  and  proximity  to  high-valued  building  features  were 
used  by  the  classifier.  All  four  classification  methods  correctly  classified  track  nine  as 
suspicious.  162  foot  grid  cell  spacing  was  again  chosen  for  this  figure  because  it  allowed 
for  the  greatest  separation  between  suspicious  and  non-suspicious  tracks.  While  all  position 
tracks  that  had  a  repetition  score  of  about  0.1  and  greater  were  suspicious,  some  non- 
suspicious  position  tracks  had  a  proximity  to  the  high-valued  building  score  as  high  as 
almost  0.7. 
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Figure  4.4:  Proximity  to  High- Valued  Building  vs.  Repetition:  wcen  =  162  feet,  Classified 
Track  =  9. 


Four  cross-sectional  feature  generation  data  plots  that  included  discriminant  function 
lines  and  curves  from  the  four  different  classification  methods  were  presented  and  discussed 
in  this  section.  These  figures  verified  the  functionality  of  the  five  feature  generation 
algorithms.  Additionally,  it  was  shown  that  the  discriminant  function  lines  and  curves 
for  the  four  classification  methods  varied  appropriately  depending  on  which  position  track 
in  the  database  was  classified.  The  next  section  presents  the  classifier  results  and  statistics. 

4.2  Classifier  Results 

This  section  communicates  the  quantitative  results  of  the  classifier  employed  in  this 
intent  assessment  system.  Subsection  4.2.1  presents  and  analyzes  the  results  generated 
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from  a  sweep  of  the  grid  resolution  parameter,  wceu.  In  subsection  4.2.2,  results  from  a 
sweep  of  the  prior  probability  of  the  suspicious  class  (P[H j])  are  presented  and  discussed. 
Finally,  subsection  4.2.3  presents  PD  vs.  Pp  data  for  each  classification  method  and  the 
corresponding  best  fit  ROC  curves. 

4.2.1  Grid  Resolution  Sweep  Results 

In  this  subsection,  a  sweep  of  the  grid  cell  dimension  parameter  wceu  (in  feet)  was 
performed  on  the  classifier  and  the  results  are  presented  and  analyzed.  This  parameter 
changed  the  resolution  of  the  grid  which  affected  the  data  generated  by  the  dwell  and 
repetition  features. 

The  classifier  performance  and  behavior  varied  as  wceu  changed  because  the  dwell  and 
repetition  feature  generation  data  depended  on  the  grid  resolution.  The  grid  cell  spacing 
parameter  wceu  was  incremented  from  2.7  feet  (or  1  image  pixel  unit)  to  1,026  feet  (380 
image  pixel  units).  The  maximum  value  for  wceU  did  not  exceed  1,026  feet  because  at  that 
grid  resolution  there  were  only  four  grid  cells.  If  wceu  were  set  to  a  longer  cell  dimension, 
the  entire  2,095  feet  x  2,095  feet  grid  would  have  been  only  one  cell.  The  dwell  and 
repetition  algorithms  would  have  failed  to  produce  accurate  feature  data. 

The  parameter  sweep  comprised  of  22  different  wceu  values.  The  classifier  classified 
each  position  track  in  the  database  for  each  different  wcen  value,  using  the  four  classification 
methods.  The  prior  probability  values  P[H0]  and  P\H\  ]  for  the  non-suspicious  and 
suspicious  classes  respectively  were  both  set  to  0.5  in  MATLAB®  for  the  entire  sweep. 
Figure  4.5  illustrates  the  overall  accuracy  of  the  classifier  for  the  grid  resolution  sweep. 
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On  average,  the  QDF  classification  method  outperformed  the  other  three  classification 
methods  in  terms  of  overall  accuracy.  The  definition  of  accuracy  in  the  context  of  this 
research  was  the  rate  at  which  the  classifier  correctly  classified  a  position  track  from 
the  database  as  non-suspicious  or  suspicious,  using  the  LOOCV  method.  The  average 
performance  for  the  QDF  method  over  the  entire  grid  resolution  spectrum  was  74.14%. 

The  Mahalanobis  classification  method  performed  the  worst  of  the  four  classification 
methods.  The  average  performance  of  the  Mahalanobis  method  for  the  grid  resolution 
sweep  was  58.26%.  While  the  Mahalanobis  classification  method  had  the  worst  average 
accuracy,  it  still  performed  (only  slightly)  better  than  the  random  guess  accuracy  rate 
(50%). 
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A  classification  method  that  randomly  classifies  a  position  track  as  suspicious  or  non- 
suspicious  is  accurate  about  50%  of  the  time.  Therefore,  a  classification  method  with  a 
defined  algorithm  for  classifying  data  that  has  an  accurate  rate  of  less  than  50%  is  no  better 
than  the  random  guess  method. 

The  best  accuracy  for  the  entire  plot  was  performed  by  the  QDF  classification  method 
when  wceii  was  270  feet.  The  QDF  accuracy  for  that  wceu  value  was  95.45%.  The  worst 
accuracy  displayed  in  the  plot  was  performed  by  the  LDF  classification  method  when  wceu 
was  756  feet.  The  LDF  accuracy  for  that  wceu  value  was  36.36%.  If  a  military  installation 
were  picking  a  classification  method  solely  based  on  maximizing  the  classifier  accuracy, 
the  QDF  method  should  be  selected,  with  wcen  set  to  270  feet. 

Tables  4.1  and  4.2  present  confusion  matrices  using  all  five  features  for  the  four 
classification  methods  for  the  grid  resolution  sweep.  These  matrices  present  the  classifier 
statistics  when  wceu  was  set  to  270  feet.  The  rows  of  the  confusion  matrices  indicate  the 
actual  classes  that  the  position  tracks  belonged  to  and  the  columns  indicate  the  classes  that 
the  classification  methods  assigned  each  position  track  as. 


LDF 

DLDF 

Non-suspicious 

Suspicious 

Non-suspicious 

Suspicious 

Non-suspicious 

8 

2 

7 

3 

Suspicious 

3 

9 

4 

8 

Table  4.1:  Confusion  Matrix  for  LDF  and  DLDF  Methods:  wcen  =  270  feet. 


The  matrices  display  the  number  of  position  tracks  in  the  database  that  were  correctly 
classified  into  each  class,  and  the  number  of  tracks  that  were  incorrectly  classified  into 
each  class.  The  LDF  classification  method  correctly  classified  eight  non-suspicious  tracks 
in  the  database  as  non-suspicious.  Two  non-suspicious  tracks  were  incorrectly  classified  as 
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suspicious.  Three  suspicious  tracks  were  incorrectly  classified  as  non-suspicious  and  nine 
suspicious  tracks  were  correctly  classified  as  suspicious. 

The  DLDF  classification  method  correctly  classified  seven  non-suspicious  tracks  in 
the  database  as  non-suspicious.  Three  non-suspicious  tracks  were  incorrectly  classified  as 
suspicious.  Four  suspicious  tracks  were  incorrectly  classified  as  non-suspicious  and  eight 
suspicious  tracks  were  correctly  classified  as  suspicious. 


QDF 

Mahalanobis 

Non-suspicious 

Suspicious 

Non-suspicious 

Suspicious 

Non-suspicious 

9 

1 

5 

5 

Suspicious 

0 

12 

0 

12 

Table  4.2:  Confusion  Matrix  for  QDF  and  Mahalanobis  Methods:  Wceii  =  270  feet. 


The  QDF  classification  method  correctly  classified  nine  non-suspicious  tracks  in 
the  database  as  non-suspicious.  One  non-suspicious  track  was  incorrectly  classified  as 
suspicious.  No  suspicious  tracks  were  incorrectly  classified  as  non-suspicious  and  all  12 
suspicious  tracks  were  correctly  classified  as  suspicious. 

The  Mahalanobis  classification  method  correctly  classified  five  non-suspicious  tracks 
as  non-suspicious.  The  other  five  non-suspicious  tracks  were  incorrectly  classified  as 
suspicious.  No  suspicious  tracks  were  incorrectly  classified  as  non-suspicious  and  all  12 
suspicious  tracks  were  correctly  classified  as  suspicious. 

The  false  negative  error  statistic  PM  vs.  wceU  and  the  false  positive  error  statistic  PF 
vs.  wceii  performances  are  now  presented  for  the  grid  resolution  sweep.  Figure  4.6  presents 
the  false  negative  error  statistic  PM  of  the  classifier  as  function  of  wce//. 
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PM  was  defined  mathematically  as  P[H0\Hi]  (H0  picked  when  H{  was  true)  and 
was  the  ratio  of  position  tracks  incorrectly  classified  as  non-suspicious  compared  to  the 
total  number  of  suspicious  tracks  in  the  database.  It  was  the  likelihood  that  the  classifier 
missed  a  suspicious  track  and  classified  it  as  non-suspicious.  This  metric  was  potentially 
more  important  to  security  personnel  than  the  overall  classifier  accuracy  because  the 
consequences  of  mis-classifying  a  suspicious  position  track  as  non-suspicious  were  severe. 
A  hostile  intruder  could  comprise  an  installation’s  security  without  its  activity  being  flagged 
as  suspicious. 

On  average,  the  Mahalanobis  classification  had  the  lowest  PM  rate:  3.03%.  The 
LDF  and  DLDF  classification  methods  had  the  worst  PM  rates  at  48.86%  and  49.24% 
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respectively.  The  Mahalanobis  classification  method  yielded  a  PM  of  0%  14  out  of  the 
22  different  wcen  values,  or  more  often  than  not.  The  QDF  classification  method  had  an 
average  PM  of  13.64%.  This  classification  method  also  yielded  a  PM  of  0%  five  out  of  the 
22  different  wceu  values.  When  wceu  was  set  to  270  feet,  the  QDF  classification  method  had 
a  PM  of  0%. 

If  a  military  installation  were  choosing  a  single  classification  method  solely  based  on 
minimizing  PM,  the  Mahalanobis  method  should  be  selected.  Figure  4.7  presents  the  false 
positive  error  rate  PF  of  the  classifier  as  a  function  of  wceu. 
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PF  was  defined  mathematically  as  P\H\ \H{)\  (H\  picked  when  H0  was  true)  and  was 
the  ratio  of  position  tracks  incorrectly  classified  as  suspicious  compared  to  the  total  number 
of  non-suspicious  tracks  in  the  database.  It  was  the  likelihood  that  the  classifier  incorrectly 
classified  a  non-suspicious  track  as  suspicious.  The  consequences  of  mis-classifying  a 
non-suspicious  position  track  as  suspicious  were  not  as  severe  as  incorrectly  classifying  a 
suspicious  track  as  non-suspicious. 

In  the  event  that  a  non-suspicious  position  track  was  classified  as  suspicious,  security 
personnel  would  be  dispatched  to  investigate  the  emitting  object.  The  act  of  dispatching 
personnel  to  investigate  the  emitter  would  expend  resources  in  various  forms  such  as  man¬ 
hours.  The  cost  of  missing  a  suspicious  track  and  subsequently  not  engaging  it  as  described 
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earlier  in  this  subsection  outweighs  the  cost  of  expending  these  resources  to  investigate  the 
uncertain  intent  of  an  emitter  that  was  non-suspicious. 

On  average,  the  LDF  classification  method  had  the  lowest  PF  rate:  11.82%. 
The  DLDF  classification  method  had  the  second  lowest  PF  rate:  13.18%.  The  QDF 
classification  method  had  the  worst  PF  rate:  40.45%.  The  LDF  classification  method 
yielded  a  PF  of  0%  nine  out  of  the  22  different  wceu  values.  This  classification  method 
had  a  PF  of  0%  for  41%  of  the  resolution  sweep.  When  wceu  was  set  to  270  feet,  the  QDF 
classification  method  had  a  PF  of  10.00%.  10.00%  was  the  lowest  PF  probability  that  the 
QDF  method  achieved  for  the  grid  resolution  sweep. 

If  a  military  installation  were  picking  a  single  classification  method  solely  based  on 
minimizing  PF,  the  LDF  method  should  be  selected.  The  quantitative  results  for  the  grid 
resolution  sweep  are  summarized  in  Table  4.3. 


Statistics 

LDF 

DLDF 

QDF 

Mahalanobis 

Average  Accuracy  (%) 

67.98 

67.15 

74.17 

58.26 

Standard  Deviation  of  Accuracy  (%) 

10.63 

10.58 

8.58 

6.82 

Average  PM  (%) 

48.86 

49.24 

13.64 

3.03 

PM  Standard  Deviation  (%) 

11.87 

12.04 

10.46 

4.10 

Average  PF  (%) 

11.82 

13.18 

40.45 

88.18 

PF  Standard  Deviation  (%) 

14.02 

13.59 

11.33 

13.68 

Table  4.3:  Summary  of  Classifier  Statistics  for  Grid  Resolution  Sweep. 
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This  subsection  communicated  the  results  of  the  grid  resolution  sweep.  The  grid  cell 
spacing  parameter  wcen  was  incremented  from  2.7  feet  (or  1  image  pixel  unit)  to  1,026  feet 
(380  image  pixel  units).  The  overall  classifier  accuracy,  PM  and  PF  were  displayed  as  a 
function  of  wce//.  The  figures  were  analyzed  and  recommendations  were  made  concerning 
which  classification  method  was  the  most  appropriate  in  terms  of  the  three  statistics.  The 
QDF  classification  method  had  the  best  overall  accuracy  of  the  four  classification  methods. 
The  Mahalanobis  classification  method  had  the  lowest  average  PM  rate  and  the  LDF  method 
had  the  lowest  average  PF  rate.  In  the  next  subsection,  the  results  generated  from  parameter 
sweeps  of  the  suspicious  class  prior  probability  P\H{\  are  presented  and  analyzed. 

4.2.2  Prior  Probability  Sweep  Results 

In  this  subsection,  a  second  parameter  sweep  was  performed  and  the  results  were 
observed.  The  prior  probability  of  the  suspicious  class  P[H\\  was  incremented  from  0.5  to 
0.9  in  0.05  increments  while  the  grid  cell  spacing  parameter  wceU  was  kept  constant.  The 
P[H\\  value  instructed  the  classifier  of  the  likelihood  that  a  position  track  was  suspicious. 
The  P[H\\  sweep  started  at  0.5  and  was  increased  from  that  value  because  it  was  not 
appropriate  in  the  context  of  the  intent  assessment  in  this  research  to  have  a  P[H{\  value 
less  than  0.5.  A  P[H J  value  less  than  0.5  meant  that  the  likelihood  of  a  suspicious  track 
occurring  was  less  than  50%,  or  half  the  time. 

If  the  classifier  was  instructed  that  the  likelihood  of  a  suspicious  track  occurring  was 
less  than  50%,  it  was  more  likely  to  classify  a  track  as  non-suspicious.  Consequently,  the 
PM  rate  would  increase  which  would  have  adverse  effects  for  a  military  installation  (as 
discussed  in  subsection  4.2.1).  As  P[H J  was  incremented  from  0.5  to  0.9,  P{Hq\  was 
decremented  from  0.5  to  0.1.  The  sum  of  P\H\  ]  and  P[Hq]  always  added  to  one.  P\  Hq  \ 
was  the  prior  probability  of  the  non-suspicious  class  and  likewise  instructed  the  classifier 
of  the  likelihood  that  a  position  track  was  non-suspicious. 
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The  P\H\  ]  sweep  was  performed  to  observe  the  classifier’s  overall  accuracy,  PM  and 
PF  probabilities  as  P\H\  ]  increased.  As  P\H\  ]  increased,  it  was  expected  that  PM  would 
decrease  and  PF  would  increase.  PM  would  decrease  as  P[H i]  increases  because  the 
classifier  would  be  more  sensitive  to  suspicious  tracks  and  was  more  likely  to  classify  a 
track  as  suspicious.  The  number  of  suspicious  position  tracks  incorrectly  classified  as  non- 
suspicious  would  decrease.  Likewise,  PF  would  increase  as  P[H J  was  increased  because 
the  number  of  non-suspicious  tracks  classified  as  suspicious  would  increase.  The  classifier 
was  more  sensitive  to  suspicious  tracks  and  was  more  likely  to  classify  a  track  as  suspicious. 

The  classifier  classified  each  position  track  in  the  database  for  each  different  P[H i] 
value  using  the  four  classification  methods.  The  results  of  two  P[H\]  sweeps  conducted  for 
two  different  wceu  values  are  presented  here.  The  wceu  values  chosen  for  this  subsection’s 
results  were  270  and  918  feet.  wceu  was  set  to  270  feet  for  the  first  P[Pl\ ]  sweep  because 
the  highest  accuracy  for  the  QDF  classification  method  (95.45%)  occurred  at  this  grid  cell 
width  value.  Therefore,  a  P[H J  sweep  with  this  wcen  value  would  theoretically  yield  the 
best  classifier  results.  wceU  was  set  to  918  feet  for  the  second  P[H{  ]  sweep  so  the  first  sweep 
results  were  compared  to  a  P[H i]  sweep  with  wceii  set  to  a  value  in  the  top  tenth  of  the  grid 
resolution  spectrum. 

The  classifier’s  overall  accuracy,  PM  and  PF  rates  vs.  P[H J  are  presented  for  each 
P[H i]  sweep.  Figure  4.8  presents  the  overall  accuracy  of  the  classifier  for  a  sweep  of 
P[H{]  when  wceu  was  set  to  270  feet. 
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Figure  4.8:  Classifier  Accuracy  for  P[H{]  Sweep:  wceu  =  270  feet. 


The  QDF  classification  method  was  the  most  accurate  of  the  four  classification 
methods  for  the  entire  sweep  of  P[H i]  when  wceu  was  set  to  270  feet.  The  average 
performance  for  the  QDF  method  for  this  sweep  was  was  89.90%.  As  P[H J  increased, 
the  accuracy  of  the  QDF  classification  method  gradually  decreased,  and  never  increased. 

The  best  accuracy  for  the  QDF  method  was  95.45%  and  occurred  when  P[H i]  was  0.5 
and  0.55.  The  QDF’s  accuracy  standard  deviation  for  the  P\H\  ]  sweep  was  4.97%.  This 
method’s  accuracy  varied  less  for  this  sweep  when  compared  to  the  method’s  accuracy  for 
the  grid  resolution  sweep  that  had  a  standard  deviation  of  8.58%. 

The  accuracy  for  the  Mahalanobis  method  remained  constant  at  77.27%.  The  accuracy 
for  the  LDF  classification  method  increased  slightly  at  first  and  then  decreased  slightly  as 
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P[H[]  increased.  Finally,  the  DLDF  classification  method’s  accuracy  decreased  slightly 
when  P[H\\  was  0.6  and  then  increased  to  72.76%  when  P[H\]  was  0.85.  All  four 
classification  method’s  accuracies  varied  less  in  this  P[H\]  sweep  when  wceu  was  set  to 
270  feet  compared  to  the  classification  method’s  accuracies  for  the  grid  resolution  sweep. 

Figure  4.9  presents  the  false  negative  error  rate  PM  of  the  classifier  as  a  function  of 
P[H\  ]  for  the  four  classification  methods.  wceu  was  set  to  270  feet. 
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Figure  4.9:  PM  Performance  for  P[H{]  Sweep:  wceU  =  270  feet. 


The  QDF  and  Mahalanobis  classification  methods  had  PM  values  of  0%  for  the  entire 
sweep.  This  means  that  there  were  no  false  negative  errors  for  these  two  methods.  The 
classification  methods  correctly  classified  all  suspicious  tracks  as  suspicious.  This  was 
ideal  because  as  stated  in  subsection  4.2.1,  the  consequences  of  mis-classifying  a  suspicious 
position  track  as  non-suspicious  were  severe. 

The  DLDF  classification  method  started  with  a  PM  value  of  33%  when  P[H J  was  0.5. 
As  P[H i]  increased,  PM  for  this  method  decreased,  leveled  once  at  25%  and  then  decreased 
to  0%  for  P[H\]  values  of  0.85  and  0.9.  Finally,  the  LDF  classification  method  had  a  PM 
value  of  25%  at  the  start  of  the  sweep  when  P[H{\  was  0.5.  As  P[H{\  increased,  the  PM  for 
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this  method  decreased  linearly  to  8.3%  when  P[H{ ]  was  0.6,  and  remained  at  8.3%  for  the 
rest  of  the  sweep. 

The  standard  deviation  of  the  PM  values  for  the  LDF,  QDF  and  Mahalanobis 
classification  methods  for  this  P[Pl\\  sweep  were  less  than  the  standard  deviation  of  the 
Pm  values  for  the  same  classification  methods  for  the  grid  resolution  sweep.  The  DLDF 
classification  method’s  Pm  standard  deviation  for  the  P[H i]  sweep  was  greater  than  the 
method’s  corresponding  standard  deviation  for  the  grid  resolution  sweep. 

It  was  expected  that  as  P[H\]  approached  0.9  during  the  sweep  that  the  PM  rates  for 
all  classification  methods  would  approach  0%.  As  P[H\]  increased,  the  classifier  became 
more  sensitive  to  suspicious  tracks  and  was  more  likely  to  classify  a  position  track  as 
suspicious.  It  was  shown  in  this  figure  that  the  number  of  false  negative  errors  decreased 
for  the  DLDF  and  LDF  classification  methods  as  P[H J  increased.  As  stated  earlier,  the 
QDF  and  Mahalanobis  classification  methods  both  had  PM  rates  of  0%  during  the  entire 
sweep  of  P[H[\.  The  LDF  method  never  achieved  a  PM  of  0%.  The  QDF  and  Mahalanobis 
classification  methods  had  the  best  overall  PM  rates  (0%)  for  the  P[H{]  sweep  when  wceu 
was  270  feet. 

Figure  4.10  illustrates  the  false  positive  error  rates  Pp  of  the  four  classification 
methods  for  the  P[H{ ]  sweep.  wceu  was  set  to  270  feet. 
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Figure  4.10:  PF  Performance  for  P[H{]  Sweep:  wceu  =  270  feet. 


The  QDF  classification  method  had  the  lowest  average  PF  value:  22.22%.  The  method 
had  a  PF  value  of  10%  at  the  beginning  of  the  sweep.  The  PF  rate  then  increased  to  20% 
and  leveled  until  P[H\]  was  0.8.  The  PF  value  then  increased  to  40%  at  the  end  of  the 
sweep.  The  ideal  P[H\\  value  for  this  method  was  0.5.  When  P[H i]  was  0.5,  the  QDF 
method  had  no  false  negative  errors  (Figure  4.9),  and  only  10%  false  positive  errors. 

The  LDF  classification  method  had  the  second-best  average  PF  rate:  33.33%.  The 
method  had  a  PF  value  of  20%  at  the  beginning  of  the  sweep.  The  PF  rate  then  increased 
linearly  to  40%  and  leveled  until  P[H i]  was  0.85.  The  PF  value  then  increased  to  50%  at 
the  end  of  the  sweep.  The  ideal  P[Hi]  value  for  this  method  was  0.6.  When  P[H J  was 
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0.6,  the  LDF  method  incorrectly  classified  10%  of  the  suspicious  tracks  as  non-suspicious 
(Figure  4.9),  and  incorrectly  classified  20%  of  the  non-suspicious  tracks  as  suspicious. 

The  Mahalanobis  and  DLDF  classification  methods  had  the  worst  average  PF  rates: 
50%.  The  Mahalanobis  method  remained  at  50%  for  the  entire  sweep.  It  consistently 
classified  half  of  the  non-suspicious  tracks  in  the  database  as  suspicious  during  the  sweep. 
The  DLDF  method  had  PF  value  of  30%  at  the  start  of  the  sweep.  As  P[H i]  increased, 
PF  for  this  method  gradually  climbed  in  a  stair-step  manner  until  P\H\  ]  was  0.85.  The  PF 
value  then  increased  from  60%  to  90%  when  P[H\\  inceased  from  0.85  to  0.9. 

It  was  expected  that  the  PF  rates  for  all  classification  methods  would  increase  as  P[H i] 
approached  0.9  during  the  sweep.  As  P[H i]  increased,  the  classifier  became  more  sensitive 
to  suspicious  tracks  and  was  more  likely  to  classify  any  position  track  as  suspicious.  It 
was  shown  in  Figure  4.10  that  the  number  of  false  positive  errors  increased  for  all  of  the 
classification  methods  as  P[H\  \  increased  except  the  Mahalanobis  method.  The  quantitative 
results  for  the  this  P[H i]  sweep  with  wceii  set  to  270  feet  are  summarized  in  Table  4.4. 


Statistics 

LDF 

DLDF 

QDF 

Mahalanobis 

Average  Accuracy  (%) 

78.79 

65.66 

89.90 

77.27 

Standard  Deviation  of  Accuracy  (%) 

3.94 

4.01 

4.97 

0 

Average  PM  (%) 

11.11 

21.30 

0 

0 

PM  Standard  Deviation  (%) 

5.89 

14.50 

0 

0 

Average  PF  (%) 

33.33 

50.00 

22.22 

50.00 

PF  Standard  Deviation  (%) 

11.18 

18.71 

10.93 

0 

Table  4.4:  Classifier  Statistics  for  P[H i]  Sweep:  wceu  =  270  feet. 


A  P[H i]  sweep  was  conducted  with  wceu  set  to  270  feet.  The  overall  accuracy,  PM  and 
PF  rates  vs.  P[H i]  were  presented  and  analyzed  for  this  sweep.  A  second  P[H{\  sweep  was 
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conducted  with  wceU  set  to  918  feet.  The  overall  accuracy,  PM  and  PF  rates  vs.  P\H\  ]  are 
now  presented  for  this  sweep.  Figure  4.11  illustrates  the  overall  accuracy  of  the  classifier 
for  this  second  P[H i]  sweep  with  wceu  set  to  918  feet. 
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Figure  4.11:  Classifier  Accuracy  for  P[H J  Sweep:  wceu  =  918  feet. 


The  QDF  classification  method  again  had  the  highest  overall  accuracy  for  this  sweep. 
However,  all  four  classification  method’s  average  accuracies  were  lower  in  this  P[H i] 
sweep  than  the  average  accuracies  in  the  previous  sweep  when  wceu  was  set  to  270  feet. 
The  average  accuracy  for  the  QDF  method  for  this  sweep  was  was  71.21%.  The  QDF 
method  had  an  accuracy  of  72.73%  at  the  start  of  the  sweep.  It  remained  at  72.73%  until 
P[H\\  was  0.75.  At  that  point  it  decreased  to  and  remained  at  68.18%  for  the  remainder  of 
the  sweep. 

The  standard  deviation  of  the  QDF  classification  method’s  accuracy  for  this  sweep  was 
2.27%.  This  method’s  accuracy  varied  less  for  this  sweep  when  compared  to  the  accuracy 
for  the  grid  resolution  sweep  that  had  a  standard  deviation  of  8.58%. 


80 


The  accuracy  for  the  Mahalanobis  method  remained  constant  at  50.00%.  The  accuracy 
for  the  LDF  classification  method  fluctuated  at  first  and  started  descending  to  36.36%  when 
P[H\  \  was  0.6.  Finally,  the  DLDF  method’s  accuracy  first  increased  slightly  to  72.73%  and 
then  gradually  decreased  to  50.00%  when  P[H{\  was  0.55. 

Figure  4.12  presents  the  false  negative  error  rate  PM  of  the  classifier  as  a  function  of 
P[H i]  for  the  four  classification  methods.  wceu  was  set  to  918  feet. 
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Figure  4.12:  PM  Performance  for  P[H J  Sweep:  wcen  =  918  feet. 


The  Mahalanobis  classification  method  had  the  best  average  PM  rate:  8.33%.  It 
remained  at  this  value  for  the  entire  sweep.  The  QDF  classification  method  had  the  second- 
best  average  PM  rate:  25.00%.  It  remained  at  this  value  for  the  entire  sweep.  The  DLDF 
classification  method  had  a  PM  value  of  60%  at  the  beginning  of  the  sweep  and  decreased 
to  10%  as  P[H\  \  increased.  Finally,  the  LDF  method  had  a  PM  value  of  50%  when  P[H i] 
was  0.5.  As  P[H  J  increased,  the  PM  rate  for  this  method  decreased  to  30%. 

It  was  again  expected  that  as  P[Pl\]  approached  0.9  during  the  sweep  that  the  PM 
rates  for  all  classification  methods  would  approach  0%.  As  P\H\  ]  increased,  the  classifier 
became  more  sensitive  to  suspicious  tracks  and  was  more  likely  to  classify  any  position 
track  as  suspicious.  It  was  shown  in  this  figure  that  the  number  of  false  negative  errors 
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decreased  for  the  DLDF  and  LDF  classification  methods  as  P[Pl\]  increased.  As  stated 
earlier,  the  QDF  and  Mahalanobis  classification  methods  both  had  PM  rates  of  25%  and 
10%  respectively  during  the  entire  sweep  of  P[H j]. 

Figure  4.13  illustrates  the  false  positive  error  rates  PF  of  the  four  classification 
methods  for  the  P[H i]  sweep.  wceu  was  set  to  918  feet. 
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The  QDF  classification  method  had  the  lowest  average  PF  value:  33.33%.  This 
method  had  a  PF  value  of  30%  at  the  beginning  of  the  sweep.  The  PF  rate  increased  to  40% 
when  P\H\  ]  was  0.8  and  remained  there  for  the  rest  of  the  sweep.  The  ideal  P[H i]  range 
for  this  method  was  0.5  to  0.75.  When  P[H\\  was  within  this  range,  the  QDF  method  had  a 
false  negative  error  rate  of  25%  (Figure  4.12),  and  a  false  positive  error  rate  of  30%.  25% 
of  suspicious  position  tracks  in  the  database  were  incorrectly  classified  as  non-suspicious, 
and  30%  of  non-suspicious  tracks  in  the  database  were  incorrectly  classified  as  suspicious. 

The  LDF  classification  method  had  the  next  best  average  PF  rate:  47.78%.  The  PF 
rate  for  this  method  increased  steeply  when  P[H{ ]  increased  from  0.6  to  0.7.  When  P[H i] 
was  0.9,  the  LDF  method  had  a  100%  false  alarm  rate.  The  DLDF  classification  method 
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had  an  average  PF  rate  of  52.22%.  The  PF  rate  for  this  method  increased  steeply  when 
P[H\\  increased  from  0.55  to  0.7.  When  P[H\ ]  was  0.9,  the  DLDF  method  had  a  100% 
false  alarm  rate. 

The  Mahalanobis  classification  method  had  the  worst  average  PF  rate:  100%.  The 
Mahalanobis  method  remained  at  100%  for  the  entire  sweep.  It  consistently  classified  all 
of  the  non-suspicious  tracks  in  the  database  as  suspicious  during  the  sweep.  The  standard 
deviation  of  the  PF  rates  for  the  LDF,  DLDF  and  QDF  classification  methods  were  greater 
than  the  standard  deviation  PM  rates  for  the  P[H\\  sweep.  The  Mahalanobis  method  had 
PM  and  PF  standard  deviations  of  0%  for  this  second  P[H  J  sweep. 

It  was  expected  that  the  PF  rates  for  all  classification  methods  would  increase  as  P\H\\ 
approached  0.9  during  the  sweep.  As  P[H\\  increased,  the  classifier  became  more  sensitive 
to  suspicious  tracks  and  was  more  likely  to  classify  any  position  track  as  suspicious.  It 
was  shown  in  Figure  4.13  that  the  number  of  false  positive  errors  increased  for  all  of  the 
classification  methods  as  P[H i]  increased  except  the  Mahalanobis  method.  The  quantitative 
results  for  the  this  P[H  J  sweep  with  wceu  set  to  918  feet  are  summarized  in  Table  4.5. 


Statistics 

LDF 

DLDF 

QDF 

Mahalanobis 

Average  Accuracy  (%) 

58.08 

61.11 

71.21 

50.00 

Standard  Deviation  of  Accuracy  (%) 

14.33 

6.86 

2.27 

0 

Average  PM  (%) 

37.04 

27.78 

25.00 

8.33 

PM  Standard  Deviation  (%) 

7.35 

20.41 

0 

0 

Average  PF  (%) 

47.78 

52.22 

33.33 

100 

PF  Standard  Deviation  (%) 

36.67 

38.33 

5.00 

0 

Table  4.5:  Classifier  Statistics  for  P[H J  Sweep:  wceU  =  918  feet. 
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Classifier  statistics  for  the  P[H{]  parameter  sweep  of  two  different  grid  resolutions 
were  presented  and  discussed  in  this  subsection.  wcen  was  set  to  270  feet  for  the  first  sweep 
and  918  feet  for  the  second  sweep.  The  next  subsection  presents  and  analyzes  ROC  curves 
generated  using  the  PF  and  PM  data  from  the  P\H\  \  sweeps  at  these  two  grid  resolutions. 

4.2.3  ROC  Curve  Analysis 

This  subsection  used  PF  and  PM  data  to  calculate  best  fit  ROC  curves  for  the  four 
classification  methods.  First,  P[H\ ]  was  swept  from  0.0  to  1.0  in  0.05  increments  in  order 
to  generate  the  largest  number  of  unique  PF-PM  data  pairs.  This  parameter  sweep  resulted 
in  21  pairs  of  PF  and  PM  data.  In  some  cases,  the  PF  and  PM  data  pairs  had  one  or  both 
values  in  the  pair  that  were  identical  to  one  or  both  values  in  a  different  data  pair.  In  chapter 
II,  it  was  shown  in  Equation  (2.16)  that  the  PD  rate  could  be  determined  if  the  PM  rate  was 
known. 


Pd=\~  Pm  (4.1) 

PD  data  was  generated  using  this  equation.  The  PF  and  PD  data  points  were  then  plotted 
in  the  x-y  coordinate  format  ( PF,PD ).  This  process  was  repeated  for  all  four  classification 
methods.  The  MATLAB®  command  polyfit  was  used  to  generate  best-fit  curves  from  the 
PF  and  Pd  data  for  each  classification  method.  Through  trial  and  error,  the  most  appropriate 
polynomial  degree  was  chosen  for  each  classification  method’s  best  fit  curve  through  the 
data  points. 

Figure  4.14  shows  the  PF  and  PD  data  points  and  corresponding  best  fit  curves  for 
each  of  the  classification  methods  when  wceU  was  set  to  270  feet. 
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Figure  4.14:  PD  vs.  PF  Data  Points  and  Best  Fit  ROC  Curves:  wceu  =  270  feet. 


There  were  13  unique  Pf-Pd  data  points  for  the  LDF  classification  method  that  were 
used  to  generate  the  ROC  curve.  A  polynomial  of  degree  four  was  the  most  appropriate 
for  the  Pf-Pd  data  points.  The  ROC  curve  for  the  LDF  method  was  concave-down.  There 
were  15  unique  Pf-Pd  data  points  for  the  DLDF  classification  method  that  were  used  to 
generate  the  ROC  curve.  A  polynomial  of  degree  two  was  the  most  appropriate  for  the 
Pf-Pd  data  points.  The  ROC  curve  for  the  DLDF  method  was  also  concave-down. 

There  were  seven  unique  Pf-Pd  data  points  for  the  QDF  classification  method  that 
were  used  to  generate  the  ROC  curve.  A  polynomial  of  degree  four  was  the  most 
appropriate  for  the  PF-PD  data  points.  The  ROC  curve  for  the  QDF  method  was  concave- 
down.  There  was  no  curve  generated  for  the  Mahalanobis  classification  method  because 
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this  method  operated  at  only  one  Pf-Pd  data  point.  The  data  point  for  this  grid  resolution 
configuration  was  (0.5,  1). 

Figure  4.15  shows  the  best  fit  curves  for  each  of  the  classification  methods  with  the 
random  guess  line  displayed  when  wcM  was  set  to  270  feet. 
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Figure  4.15:  Best  Fit  ROC  Curves:  wceU  =  270  feet. 
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The  three  generated  ROC  curves  and  Mahalanobis  point  resided  above  the  random 
guess  line  for  the  entire  range  of  PF.  The  QDF  classification  method  had  the  most  efficient 
ROC  curve.  The  point  on  the  curve  where  this  classification  method  performed  the  best  was 
at  the  Pf-Pd  coordinates  (0.1,  1).  The  method  had  a  PD  of  100%  and  a  PF  of  only  10%. 
At  this  point  on  the  curve,  the  QDF  method  classified  every  suspicious  track  as  suspicious, 
and  only  10%  of  non-suspicious  tracks  were  classified  as  suspicious. 

The  Mahalanobis  method  operated  at  only  one  Pf-Pd  data  point.  This  point  was  at 
the  Pf-Pd  coordinates  (0.5,  1).  At  this  point,  the  classification  method  had  a  PD  of  100% 
and  a  PF  of  50%.  The  method  classified  every  suspicious  track  as  suspicious  and  50%  or 
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half  of  the  non-suspicious  tracks  as  suspicious.  While  this  point  was  not  as  accurate  as  the 
(0.1,  1)  point  on  the  QDF  ROC  curve,  it  was  still  above  the  random  guess  line. 

The  LDF  classification  method’s  best  fit  ROC  curve  was  not  as  ideal  as  the  QDF 
method’s  curve  but  was  still  above  the  random  guess  lines  for  the  entire  range  of  PF.  The 
method  achieved  a  PD  rate  of  about  0.77  when  PF  was  0.2,  and  a  PD  of  about  0.92  when 
PF  was  0.40.  The  DLDF  classification  method’s  best  fit  ROC  curve  also  was  not  as  ideal  as 
the  QDF  method’s  curve  but  was  still  above  the  random  guess  lines  for  all  PF  values.  The 
method  achieved  a  PD  rate  of  about  0.75  when  PF  was  0.4,  and  a  PD  of  0.9  when  PF  was 
0.46. 

Figure  4.16  shows  the  PF  and  PD  data  points  and  corresponding  best  fit  curves  for 
each  of  the  classification  methods  when  wceu  was  set  to  918  feet. 
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Figure  4.16:  PD  vs.  PF  Data  Points  and  Best  Fit  ROC  Curves:  wceu  =  918  feet. 


There  were  15  unique  Pf-Pd  data  points  for  the  LDF  classification  method  that  were 
used  to  generate  the  ROC  curve.  A  polynomial  of  degree  six  was  the  most  appropriate  for 
the  Pf-Pd  data  points.  The  ROC  curve  for  the  LDF  method  was  concave-down.  There 
were  12  unique  Pf-Pd  data  points  for  the  DLDF  classification  method  that  were  used  to 
generate  its  ROC  curve.  A  polynomial  of  degree  four  was  the  most  appropriate  for  the 
Pf-Pd  data  points.  The  ROC  curve  for  the  DLDF  method  was  also  concave-down. 

There  were  five  unique  Pf-Pd  data  points  for  the  QDF  classification  method  that  were 
used  to  generate  the  ROC  curve.  A  polynomial  of  degree  four  was  the  most  appropriate  for 
the  Pf-Pd  data  points.  The  ROC  curve  for  the  QDF  method  was  also  concave-down 
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es.  There  was  again  no  curve  generated  for  the  Mahalanobis  classification  method 
because  this  method  operated  at  only  one  Pf-Pd  data  point.  The  data  point  for  this  grid 
resolution  configuration  was  (1,  0.92). 

Figure  4.17  shows  the  best  fit  curves  for  each  of  the  classification  methods  with  the 
random  guess  line  when  wcen  was  set  to  918  feet. 
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Figure  4.17:  Best  Fit  ROC  Curves:  wce//  =  918  feet. 


The  best  fit  ROC  curve  for  the  QDF  classification  method  at  this  grid  configuration 
was  the  only  curve  that  did  not  pass  under  the  random  guess  line  for  the  entire  range  of  PF. 
The  best  fit  ROC  curves  for  the  LDF  and  DLDF  classification  methods  eventually  passed 
under  the  random  guess  line  as  PF  increased  from  0  to  1.  The  Mahalanobis  data  point  also 
resided  under  the  random  guess  line. 

The  QDF  classification  method  was  not  as  ideal  for  this  set  of  Pf-Pd  data  compared 
to  the  grid  configuration  when  wceu  was  set  to  270  feet.  When  PF  was  0.2,  PD  was  about 
0.70.  When  PF  was  0.4,  PD  for  the  QDF  method  only  increased  to  0.75.  It  can  be  asserted 
that  the  most  ideal  point  on  the  curve  was  (0.2,  0.70).  This  point  on  the  curve  was  above 
the  random  guess  line. 
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The  DLDF  classification  method  did  not  perform  as  well  for  this  set  of  Pf-Pd  data 
compared  to  when  wceu  was  set  to  270  feet.  When  PF  was  0.2,  PD  was  about  0.5.  When  PF 
was  0.6,  Pd  for  this  method  was  about  to  0.8.  It  can  be  asserted  that  the  most  ideal  point 
on  the  curve  was  (0.4,  0.65).  This  point  on  the  curve  was  still  above  the  random  guess  line. 

The  LDF  classification  method  also  did  not  perform  as  well  for  this  set  of  Pf-Pd  data 
compared  to  when  wcen  was  set  to  270  feet.  The  curve  achieved  a  PD  value  of  about  0.65 
relatively  early,  when  PF  was  was  0.2.  As  PF  increased,  PD  for  this  method  remained  about 
0.65  as  PF  approached  1.  It  passed  under  the  random  guess  line  when  PF  was  0.66.  It  can 
be  asserted  that  the  most  ideal  point  on  the  curve  was  (0.3,  0.66).  This  point  on  the  curve 
was  still  above  the  random  guess  line. 

Finally,  the  Mahalanobis  method  again  operated  at  only  one  Pd  vs.  Pf  point  on  the 
figure.  This  point  was  at  the  coordinates  (1,  0.92)  and  was  located  under  the  random  guess 
line.  At  this  point,  the  classification  method  had  a  PD  of  92%  and  a  PF  of  100%.  The 
method  incorrectly  classified  every  non-suspicious  track  in  the  database  as  suspicious,  and 
correctly  classified  91.67  %  or  almost  all  suspicious  tracks  in  the  database  as  suspicious. 

The  QDF  classification  method  had  the  most  accurate  best  fit  ROC  curve  for  this  grid 
resolution  configuration.  At  the  point  where  it  was  most  efficient,  about  70%  of  suspicious 
tracks  were  correctly  classified  as  suspicious  while  only  20%  of  non-suspicious  tracks  were 
incorrectly  classified  as  suspicious. 

The  results  of  the  generated  PF  and  PD  data  and  best-fit  ROC  curves  when  wceu  was 
set  to  270  and  918  feet  were  presented  and  analyzed  in  this  subsection.  PF  and  PM 
data  were  generated  for  each  classification  method  from  a  P[H\]  parameter  sweep  from 
0  to  1.  PD  data  for  each  method  was  then  calculated  and  the  Pf-Pd  data  points  were 
plotted.  Polynomial  functions  that  represented  best  fit  ROC  curves  were  fit  to  the  data 
points  using  trial  and  error.  The  best  fit  ROC  curve  for  each  classification  method  for  both 
grid  resolution  configurations  were  presented  and  analyzed. 
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4.3  Chapter  Summary 

This  chapter  presented  the  results  that  were  attained  in  this  research.  First,  four 
cross-sectional  feature  generation  data  plots  were  presented  and  discussed  in  section  4.1. 
Then,  the  classifier  results  and  statistics  were  presented  in  section  4.2.  This  subsection 
first  presented  the  classifier  results  for  the  grid  resolution  sweep.  As  emphasized  in 
subsection  4.2.1,  the  QDF  classification  method  outperformed  the  other  three  classification 
methods  in  overall  accuracy.  Then,  two  P[H{]  sweeps  were  conducted  for  wceU  values  of 
270  and  918  feet.  Finally,  Pf-Pd  data  points  were  generated  and  calculated  from  a  P\H\  \ 
sweep  from  0  to  1  and  best  fit  ROC  curves  were  applied  to  the  data  in  subsection  4.2.3. 

This  chapter  communicated  that  the  QDF  classification  method  outperformed  the  other 
classification  methods  used  in  this  research.  This  method  performed  the  best  when  the  grid 
cell  width  wceu  was  set  to  270  feet.  At  this  grid  resolution,  the  QDF  accuracy  was  95.45% 
when  it  classified  the  22  position  tracks  one  at  a  time.  The  average  PM  rate  during  the  P\H\  \ 
sweep  for  the  QDF  method  at  wceu  =  270  feet  was  0%  and  the  average  PF  rate  was  22.22%. 
The  lowest  PM  and  PF  rates  for  the  QDF  method  were  10%  and  0%,  respectively.  The  prior 
probabilities  for  the  non-suspicious  and  suspicious  classes  (P[H0\  and  P[H i]  respectively) 
were  both  set  to  0.5  or  50%  class  for  this  system  configuration. 

Chapter  V  is  the  final  chapter  and  presents  a  summary  and  conclusion  of  this  research. 
Additionally,  a  future  work  section  is  included  which  outlines  areas  of  future  research  that 
can  be  conducted. 
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V.  Summary,  Conclusions  and  Future  Work 


This  chapter  summarizes  this  thesis  and  the  results  produced  by  it.  It  also  details 
significant  conclusions  drawn  from  the  research  and  areas  for  future  work. 

5.1  Summary 

The  purpose  of  this  research  was  to  determine  the  feasibility  of  an  RF  emitter  tracking 
and  intent  assessment  as  a  means  for  enhanced  physical  security  of  a  military  installation. 
Chapter  I  introduced  a  brief  background  on  this  research  area  and  included  the  problem 
statement,  the  scope  and  application  of  the  research,  the  research  objectives,  equipment 
needed  and  the  motivation  for  this  thesis. 

A  more  detailed  background  of  source  localization  and  pattern  recognition  concepts 
were  provided  in  chapter  II.  Additionally,  current  research  thrusts  pertaining  to  employing 
pattern  recognition  techniques  within  a  WSN  to  perform  anomaly  detection  both  in  the 
physical  and  cyber  domains  were  discussed.  Chapter  III  communicated  the  methodologies 
used  in  this  research.  The  chapter  covered  the  standard  operation  of  the  Magellan®  Mobile 
GPS  unit,  position  track  processing,  feature  generation  from  the  position  track  database 
and  classification  of  the  position  tracks. 

The  results  of  the  intent  assessment  were  presented  chapter  IV.  First,  the  discriminant 
lines  and  curves  for  the  four  classification  methods  used  in  this  research  were  plotted  over 
cross-sectional  generated  feature  data.  These  figures  displayed  data  for  two  features  at  a 
time.  The  classifier  statistics  generated  from  two  parameter  sweeps  were  then  presented. 
The  two  parameters  were  the  grid  cell  spacing  parameter  (wceu)  and  the  suspicious  class 
prior  probability  (P[Hi]).  The  classifier’s  accuracy  as  well  as  PM  and  PF  errors  were 
analyzed  for  both  the  grid  resolution  and  P[H{]  sweeps. 
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Finally,  best  fit  ROC  curves  were  generated  using  the  PF  and  PM  data  from  the  P\H\  \ 
sweep.  The  PM  rates  was  converted  into  PD  rates  and  the  data  points  were  plotted  in  the 
x-y  coordinate  plane.  Best  fit  ROC  curves  were  generated  to  fit  the  PF  and  PD  data  for  each 
of  the  four  classification  methods.  The  best  fit  ROC  curves  for  each  classification  method 
were  analyzed  and  compared. 

5.2  Conclusions 

This  research  has  shown  that  it  is  possible  to  correctly  classify  position  tracks  as  non- 
suspicious  or  suspicious  using  the  feature  data  generated  from  them.  In  this  research,  data 
from  five  different  features  was  generated  for  each  position  track.  However,  a  position 
track  can  be  classified  as  non-suspicious  or  suspicious  using  just  one  set  of  feature  data. 
As  the  number  of  features  used  in  the  feature  generation  processes  increased,  the  classifier 
accuracy  also  increased. 

Chapter  IV  explained  that  the  QDF  classification  method  outperformed  the  other 
classification  methods  used  in  this  research.  This  method  performed  the  best  when  the  grid 
cell  width  wcen  was  set  to  270  feet.  At  this  grid  resolution,  the  QDF  accuracy  was  95.45% 
when  it  classified  the  22  position  tracks  one  at  a  time.  The  average  PM  rate  during  the  P[H\  \ 
sweep  for  the  QDF  method  at  wceu  =  270  feet  was  0%  and  the  average  PF  rate  was  22.22%. 
The  lowest  PM  and  PF  rates  for  the  QDF  method  were  10%  and  0%,  respectively.  The  prior 
probabilities  for  the  non-suspicious  and  suspicious  classes  ( P[HQ ]  and  P[H{]  respectively) 
were  both  set  to  0.5  or  50%  class  for  this  system  configuration. 

This  research  confirms  the  feasibility  and  practicality  of  implementing  an  RF  emitter 
tracking  and  intent  assessment  for  a  military  installation  with  the  design  to  improve 
physical  security.  The  data  and  results  that  were  produced  by  this  research  and 
communicated  in  chapter  IV  show  that  accurate  feature  data  can  be  generated  from  position 
tracks  and  passed  to  a  classifier  using  the  QDF  classification  method  to  determine  if  the 
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behavior  is  suspicious  or  non-suspicious.  The  next  section  discusses  areas  for  future  work 
and  research  pertaining  to  this  thesis. 

5.3  Future  Work 

There  are  many  areas  that  future  work  can  be  conducted  in  this  research  field.  The 
following  subsections  communicate  how  further  advancements  can  be  realized  through 
additional  research. 

5.3.1  Implementing  Accurate  Real-time  Geolocation  with  TelosB  Motes 

A  grid  of  wireless  Memsic®  TelosB  sensors  that  geolocate  an  emitter  in  motion  can 
replace  the  requirement  for  collecting  position  tracks  with  the  GPS  unit  for  the  position 
track  database.  The  base  station  would  collect  the  RSS  values  from  each  sensor  and  pass 
a  complete  RSS  data  set  in  real-time  to  an  estimation  algorithm.  One  possible  algorithm  is 
the  MLE.  A  position  track  could  then  be  created  of  the  x-y  position  estimates  produced  by 
the  estimator.  The  feature  generation  and  classification  processes  described  in  chapters  II 
and  III  of  this  thesis  could  be  applied  in  the  same  manner. 

5.3.2  Tracking  Multiple  Emitters  Simultaneously  in  a  WSN 

Another  specific  area  that  can  be  researched  is  the  ability  to  simultaneously  track 
multiple  emitters  in  real-time.  For  this  to  be  possible,  the  intent  assessment  system  must 
include  functionality  to  differentiate  one  emitter  from  another.  The  system  would  keep 
track  of  differences  in  communication  protocol  or  signal  structure  between  two  or  more 
transmitters.  One  example  of  a  possible  distinguisher  for  different  RF  devices  is  the  Media 
Access  Control  (MAC)  address. 

The  RSS  values  from  the  sensors  would  be  separated  into  exclusive  RSS  data  sets 
for  each  emitter  in  the  network.  A  position  estimator  would  input  the  data  sets  separately 
in  order  to  produce  device-specific  x-y  position  estimates.  The  position  tracks  created 
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from  geolocating  multiple  emitters  would  also  be  processed  separately.  Mutually  exclusive 
feature  data  would  be  generated  separately  for  each  emitter’s  position  track. 

Each  emitter  in  the  WSN  would  be  classified  as  suspicious  or  non-suspicious.  If  any 
one  or  combination  of  multiple  emitters  being  tracked  and  monitored  in  the  network  were 
flagged  as  suspicious,  the  military  installation  security  personnel  would  be  notified. 

5.3.3  Increasing  the  Size  of  the  WSN 

Increasing  the  size  of  the  data  collect  area  used  for  this  research  (defined  by  the 
overhead  image  presented  in  chapter  III)  would  allow  for  additional  landmark  features  to 
be  employed  in  the  classification  process.  Additionally,  position  tracks  collected  in  the  new 
area  would  have  increased  variability  and  diversity  which  would  create  broader  data  sets 
after  the  feature  generation  process. 

5.3.4  Increasing  the  Size  of  the  Position  Track  Database 

Collecting  more  position  tracks  within  the  area  defined  by  the  overhead  satellite  image 
would  increase  the  size  of  the  position  track  database.  Expanding  the  database  would 
increase  the  training  data  set  used  by  the  classifier  when  it  classifies  an  unknown  track 
using  the  LOOCV  method. 

Increasing  the  number  of  position  tracks  in  the  database  would  also  increase  the 
resolution  of  the  PF  and  PD  data  presented  in  chapter  IV.  In  this  research  the  resolution 
was  only  0.1  because  the  database  used  in  this  research  was  comprised  of  only  10 
non-suspicious  and  12  suspicious  tracks.  As  the  number  of  position  tracks  in  the 
database  increases,  the  standard  deviation  of  PF  would  decrease,  effectively  increasing  the 
confidence  of  the  data. 

5.3.5  Improving  Feature  Generation 

The  methods  for  generating  feature  data  can  be  improved.  Research  can  be  conducted 
to  improve  the  five  existing  feature  generation  algorithms  discussed  in  chapter  III,  and  also 
to  create  new  features  that  can  be  used  for  classification. 
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A  more  sophisticated  repetition  algorithm  can  be  implemented  to  generate  more 
accurate  repetition  feature  data.  The  algorithm  would  record  the  maximum  euclidean 
distance  that  the  position  track  reached  before  returning  to  a  particular  grid  cell. 
Additionally,  the  length  of  time  (in  time-stamps)  that  the  track  remains  from  a  particular 
grid  cell  can  be  recorded. 

An  extensive  investigation  can  be  performed  to  determine  the  complexity  and 
efficiencies  of  the  dwell  and  repetition  feature  algorithms.  Specifically,  the  study  can 
determine  how  the  grid  resolution  affects  the  accuracy  of  the  feature  data  generated  by 
these  two  algorithms  and  the  required  processing-times  on  a  micro-level. 

An  algorithm  can  be  developed  using  the  grid  described  in  chapter  III  to  determine  the 
direction  that  an  emitter  is  traveling.  This  can  be  accomplished  by  recording  the  grid  cell 
that  the  emitter  currently  resides  in,  and  then  recording  the  adjacent  or  comer  grid  cell  that 
the  emitter  moves  to  when  it  leaves  the  current  cell.  With  this  information,  a  direction  can 
determined  that  the  emitter  is  traveling.  This  direction  would  be  one  of  the  four  cardinal  or 
four  intermediate  directions:  north,  east,  west,  south,  north-east,  etc. 

The  algorithm  would  then  record  the  number  of  times  that  the  emitter  in  motion 
changed  its  direction.  Suspicious  tracks  generally  possess  more  direction  changes  than 
non-suspicious  tracks  do.  Therefore,  this  direction  feature  algorithm  would  assign  a  high 
score  pertaining  to  suspicious  activity  for  a  position  track  that  changed  direction  frequently. 
A  low  score  would  be  assigned  to  a  track  that  changed  its  direction  minimally. 

The  velocity  of  a  position  track  can  be  calculated  using  a  Kalman  filter.  The  filter  will 
observe  each  position  estimate  over  time  and  take  the  time  derivative  of  the  change  in  x-y 
position  to  determine  the  instantaneous  velocity. 

Additionally,  more  landmark  features  can  be  employed  in  the  future.  The  buildings 
that  comprise  AFIT  can  be  used  as  landmark  features.  Specifically,  the  exterior  doors  of 
the  buildings  can  be  set  as  non-suspicious  in  the  landmark  distance  pixel  map,  and  the  rest 
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of  the  perimeters  can  be  set  as  suspicious.  In  this  manner,  the  landmark  feature  algorithm 
for  the  buildings  will  attribute  a  low  score  corresponding  to  non-suspicious  activity  for 
position  tracks  that  come  within  close  proximity  to  the  doors,  and  subsequently  a  high  score 
corresponding  to  suspicious  activity  when  position  tracks  come  within  close  proximity  to 
a  building’s  exterior  wall  or  window. 

Other  landmark  features  that  can  be  incorporated  into  the  classifier  are  the  WPAFB 
Area  B  perimeter  fence  and  additional  high- valued  areas  such  as  other  buildings  and  power 
substations.  With  the  addition  of  these  new  features,  the  accuracy  of  the  classifier  would 
improve. 

5.3.6  Using  Different  Classification  Methods 

Different  classification  methods  can  be  implemented  in  MATLAB®  for  the  intent 
classifier.  The  results  produced  by  these  new  methods  can  be  incorporated  with  the  results 
presented  in  chapter  IV  to  create  a  more  comprehensive  view  of  the  intent  assessment 
performance.  Radial  Basis  Functions  (RBFs)  and  NNs  can  be  implemented  to  perform 
position  track  classification.  Different  types  of  cross  validation  can  be  performed  on  the 
data  collected  in  this  research.  The  LOOCV  method  was  the  only  method  employed  in  this 
thesis.  Two-fold  cross  validation  is  one  type  of  cross  validation  that  can  be  implemented. 

5.4  Chapter  Summary 

This  chapter  summarized  this  thesis  and  the  results  produced  by  it.  Significant 
conclusions  drawn  from  the  research  and  areas  for  future  work  were  presented. 
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Appendix:  Detailed  Classifier  Results 


Position  Track  # 

Class 

1 

Suspicious 

2 

Suspicious 

3 

Suspicious 

4 

Non-Suspicious 

5 

Non-Suspicious 

6 

Non-Suspicious 

7 

Non-Suspicious 

8 

Suspicious 

9 

Suspicious 

10 

Suspicious 

11 

Non-Suspicious 

12 

Suspicious 

13 

Non-Suspicious 

14 

Non-Suspicious 

15 

Suspicious 

16 

Non-Suspicious 

17 

Suspicious 

18 

Non-Suspicious 

19 

Suspicious 

20 

Suspicious 

Table  A.l:  Classifier  Database 
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Position  Track  #  Class 


21 

Non-Suspicious 

22 

Suspicious 

Table  A. 2:  Classifier  Database 
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Figure  A.l:  Track  1  on  Overhead  Imagery. 
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Figure  A. 2:  Track  2  on  Overhead  Imagery. 
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Figure  A. 3:  Track  3  on  Overhead  Imagery. 
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Figure  A.4:  Track  4  on  Overhead  Imagery. 
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Figure  A. 5:  Track  5  on  Overhead  Imagery. 
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Figure  A. 6:  Track  6  on  Overhead  Imagery. 
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Figure  A.7:  Track  7  on  Overhead  Imagery. 
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Figure  A. 8:  Track  8  on  Overhead  Imagery. 
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Figure  A. 9:  Track  9  on  Overhead  Imagery. 
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Figure  A.  10:  Track  10  on  Overhead  Imagery. 
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Figure  A.ll:  Track  11  on  Overhead  Imagery. 
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Figure  A.  12:  Track  12  on  Overhead  Imagery. 


500 


1000 
x  (feet) 


1500 


2000 


2000 

1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 

0 

0 


Figure  A.  13:  Track  13  on  Overhead  Imagery. 
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Figure  A.  14:  Track  14  on  Overhead  Imagery. 
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Figure  A.  15:  Track  15  on  Overhead  Imagery. 
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Figure  A.  16:  Track  16  on  Overhead  Imagery. 


CD 

CD 


2000 


2000 

1800 

1600 

1400 

1200 

1000 

800 

600 

400 

200 

0 

0 


500 


1000 


1500 


x  (feet) 


Figure  A.  17:  Track  17  on  Overhead  Imagery. 
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Figure  A.  18:  Track  18  on  Overhead  Imagery. 
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Figure  A.  19:  Track  19  on  Overhead  Imagery. 


112 


2000 


1800 

1600 

1400 

~  1200 
<D 
<D 

—  1000 
800 
600 
400 
200 
0 

0  500  1000  1500  2000 

x  (feet) 


Figure  A. 20:  Track  20  on  Overhead  Imagery. 
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Figure  A.21:  Track  21  on  Overhead  Imagery. 
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Figure  A. 22:  Track  22  on  Overhead  Imagery. 
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